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Preface 


This book is the result of ten years on and off thinking about infinite regresses 
in epistemology. It draws on several of our papers, which, partly because of 
the development of our thoughts, are not always well connected. 

Our overall purpose here is to show how our understanding of infinite 
epistemic chains benefits from an analysis of justification in terms of prob- 
ability theory. It has been often assumed that epistemic justification is prob- 
abilistic in character, but we think that the consequences of this assumption 
for the epistemic regress problem have been insufficiently taken into account. 

The book has eight chapters, detailed calculations having been relegated 
to appendices. Chapter 1 contains an introduction to the epistemological 
regress problem, giving some historical background, and recalling its three 
attempted solutions, foundationalism, coherentism and infinitism. Chapter 2 
discusses different views on epistemic justification, since they bear on both 
the framing of the problem and its proposed solution. Chapters 3 and 4 form 
the core of the book. Taking as our point of departure a debate between 
Clarence Irving Lewis and Hans Reichenbach, we introduce the concept of a 
probabilistic regress, and we explain how it leads to a phenomenon that we 
call fading foundations: the importance of a foundational proposition dwin- 
dles away as the epistemic chain lengthens. In Chapters 5 and 6 we describe 
how a probabilistic regress resists the traditional objections to infinite epis- 
temic chains, and we reply to objections that have been raised against prob- 
abilistic regresses themselves. Chapter 7 compares a probabilistic regress to 
an endless hierarchy of probability statements about probability statements; 
it is demonstrated that the two are formally equivalent. In the final chapter 
we leave one-dimensional chains behind and turn to multi-dimensional net- 
works. We show that what we have found for linear chains applies equally 
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to networks that stretch out in many directions: the effect of foundational 
propositions fades away as the network expands. 

Epistemic regresses are not the only regresses about which philosophers 
have wracked their brains. The ancient Greeks and the mediaeval scholastics 
worried a lot about infinite causal chains, and more recently philosopers have 
shown interest in the phenomenon of grounding. Although we remain silent 
about the latter, and only tangentially touch upon the former, we believe 
that our analysis could shed light on causal regresses — on condition that 
causality is interpreted probabilistically. 

We owe much to others who have concurrently been thinking about epis- 
temic regresses, notably Peter D. Klein and Scott F. Aikin. Peter Klein de- 
serves the credit for being the first to set the cat among the pigeons by sup- 
posing that infinite regresses in epistemology are not prima facie absurd. 
With Scott Aikin one of us organized a workshop on infinite regresses in 
October 2013 at Vanderbilt University. This resulted in a special issue of 
Metaphilosophy (2014, vol. 45 no. 3), which was soon followed by a special 
issue of Synthese (2014, vol. 191 no. 4), co-edited with Sylvia Wenmackers. 

The writing of this book has been made possible by financial support 
from the Dutch Organization for Scientific Research (Nederlandse Organ- 
isatie voor Wetenschappelijk Onderzoek, NWO), grant number 360-20-280. 
Our colleagues at the Faculty of Philosophy of the University of Groningen 
provided support of many different kinds. This has meant a lot to us and we 
thank them very much. 


Aix-en-Provence, October 2015 


David Atkinson and Jeanne Peijnenburg 
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Chapter 1 
The Regress Problem 


Abstract 

The attempt to justify our beliefs leads to the regress problem. We briefly 
recount the problem’s history and recall the two traditional solutions, foun- 
dationalism and coherentism, before turning to infinitism. According to in- 
finitists, the regress problem is not a genuine difficulty, since infinite chains 
of reasons are not as troublesome as they may seem. A comparison with 
causal chains suggests that a proper assessment of infinitistic ideas requires 
that the concept of justification be made clear. 


1.1 Reasons for Reasons: Agrippa’s Trilemma 


We believe many things: that the earth is a spheroid, that Queen Victoria 
reigned for more that sixty years, that Stockholm is the capital of Finland, 
that the Russians were the first to land on the moon. Some of these beliefs 
are true, others are false. A belief might be true by accident. Suppose I have 
a phobia which makes me believe that there is a poisonous snake under my 
bed. After many visits to a psychiatrist and intensive therapy I gradually 
try to convince myself that this belief stems from traumatic and suppressed 
childhood experiences. One fine day I finally reach the point where I, nerv- 
ous and trembling, force myself to get into bed before first looking under 
it. Unbeknownst to me or the psychiatrist, however, a venomous snake has 
escaped from the zoo and has ensconced itself under my bed. My belief in 
the proposition “There is a poisonous snake under my bed’ is true, but it 
is accidentally true. I do not have a good reason for this belief, since I am 


© The Author(s) 2017 1 
D. Atkinson, J. Peijnenburg, Fading Foundations, Synthese Library 383, 
DOI 10.1007/978-3-319-58295-5_1 


2 1 The Regress Problem 


ignorant of the escape and agree with the psychiatrist that reasons based on 
my phobia are not good reasons. 

If however a belief is based on good reasons, we say that it is epistemically 
justified. Had I been aware of the fact that the snake had escaped and in fact 
had made its way to my bedroom, I would have been in possession of a 
good reason, and would have been epistemically justified in believing that 
the animal was lying under my bed. 

According to a venerable philosophical tradition, a true and justified belief 
is a candidate for knowledge. One of the things that is needed in order for 
me to know that there is a snake under my bed is that the good reason I have 
for it (namely my belief that the reptile had slipped away and is hiding in my 
room) is itself justified. Without that condition, my reason might be itself a 
fabrication of my phobic mind, and thus ultimately fall short of being a good 
reason. 

What would count as a good reason for believing that a snake has es- 
caped and installed itself in my bedroom? Here is one: an anxious neighbour 
knocks on my door, agitatedly telling me about the escape. But how do I 
know that what the neighbour says is true? It seems I need a good reason 
for that as well. My friendly neighbour shows me a text message on his cell- 
phone, just sent by the police, which contains the alarming news. That seems 
to be quite a good reason — although, how do I know that the police are well 
informed? I need a good reason for that as well. I call the head of police, who 
confirms the news, and says that he was apprised of it by the director of the 
zoo; I call the director, who tells me that the escape has been reported to her 
by the curator of the reptile house, and so on. True, my actions are somewhat 
curious, and they may well signal that a phobia for snakes is not the only 
mental affliction that plagues me. The point however is not a practical but a 
principled one. It is that a reason is only a good reason if it is backed up by 
another good reason, which in turn is backed up by still another other good 
reason, and so on. We thus arrive at a chain of reasons, where the proposi- 
tion “There is a dangerous snake under my bed’ (the target proposition q) is 
justified by ‘A neighbour knocks on my door and tells me that a snake has 
escaped’ (reason A1), which is justified by “The police sent my neighbour a 
text message about the escape’ (reason A2), which is justified by A3, and so 
on: 

q < A] ¢ A2 4 A3 ¢ Aq... (1.1) 


Such a justificatory chain, as we shall call it, gives rise to the regress problem. 
It places us in a position where we have to choose between two equally 
unattractive options: either the chain must be continued, for otherwise we 


1.1 Reasons for Reasons: Agrippa’s Trilemma 3 


cannot be said to know the proposition g, or the chain must come to a stop, 
but then it seems we are not justified in claiming that we really can know g, 
since there is no reason for stopping. Laurence Bonjour called considerations 
relating to the regress problem “perhaps the most crucial in the entire theory 
of knowledge”, and Robert Audi observes that no epistemologist quite knows 
how to handle the problem.! 

The roots of the regress problem extend far back into epistemological 
history, and scholars often refer to the Greek philosopher Agrippa. Little 
is known about Agrippa, apart from the fact that he probably lived in the 
first century A.D. and might have been among the group of sceptics dis- 
cussed by Sextus Empiricus, a philosopher and practising physician who al- 
legedly flourished a century later. Sextus’ most famous work, Outlines of 
Pyrrhonism, contains an explanation and defence of what he takes to be the 
philosophy of another shadowy figure, namely Pyrrho of Elis (c. 365-270 
B.C.), who himself wrote nothing, but became known for his sober life style 
and his aversion to academic or theoretical reasoning. So-called Pyrrhonian 
scepticism advocates the attainment of ataraxia, a state of serene calmness in 
which one is free from moods or other disturbances. An important technique 
for reaching this state is the practicing of argument strategies known as tropoi 
or modes, i.e. means to engender suspension of judgement by undermining 
any claim that conclusive knowledge or justification has been attained. For 
example, if it were claimed that a particular sound is known to be soft, a 
Pyrrhonian would point out that to a dog it is loud, and that we cannot judge 
the loudness or softness independently of the hearer. Typically, a Pyrrhonian 
will try to thoroughly acquaint himself with the modes, so that reacting in 
accordance with them becomes as it were a second nature. In this manner he 
will be able to routinely refrain from assenting to any weighty proposition 
q or ~q, and thus avoid getting caught up in one of those rigid intellectual 
positions that he loathes so much. 

In Book 1 of Outlines of Pyrrhonism, Sextus discusses five modes which 
he attributes to “the more recent Sceptics” (to be distinguished from what 
he calls “the older Sceptics”), and which Diogenes Laertius in the third cen- 
tury would identify with “Agrippa and his school”.” Of these five modes the 


' Bonjour 1985, p.18; Audi 1998, 183-184. The thought is echoed by Michael Hue- 
mer when he writes that regress arguments “concern some of the most fundamental 
and important issues in all of human inquiry” (Huemer 2016, 16). 

2 Sextus Empiricus, Outlines of Pyrrhonism, Book I, 164; see p. 40 in the transla- 
tion Outlines of scepticism by Julia Annas and Jonathan Barnes. Diogenes Laertius, 
Lives of eminent philosophers, Volume 2, Book 9, 88. We thank Tamer Nawar and 
an anonymous referee for guidance in matters of ancient philosophy. 
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three that are of especial interest are the Mode of Infinite Regress, the Mode 
of Hypothesis, and the Mode of Circularity or Reciprocation. Here is how 
Sextus explains them: 


In the mode deriving from infinite regress, we say that what is brought for- 
ward as a source of conviction for the matter proposed itself needs another 
source, which itself needs another, and so on ad infinitum, so that we have no 
point from which to begin to establish anything, and suspension of judgement 
follows. ... We have the mode from hypothesis when the Dogmatists, being 
thrown back ad infinitum, begin from something which they do not establish 
but claim to assume simply and without proof in virtue of a concession. The 
reciprocal mode occurs when what ought to be confirmatory of the object 
under investigation needs to be made convincing by the object under inves- 
tigation; then, being unable to take either in order to establish the other, we 
suspend judgement about both.* 


In other words, whenever a ‘dogmatist’ (as Sextus calls any philosopher who 
is not a Pyrrhonian sceptic) claims that he knows a proposition q, the Pyrrho- 
nian sceptic will ask him what his reason is for g. After the dogmatist has 
given his answer, for example reason Aj, the sceptic will ask further: what 
is your reason for Aı? In the end it will become clear that the dogmatist has 
only three options open to him, jointly known as ‘Agrippa’s Trilemma’: 


1. He goes on giving reasons for reasons for reasons, without end. 

2. He stops at a particular reason, claiming that this reason essentially justi- 
fies all the others that he has given. 

3. He reasons in a circle, where his final reason is identical to his first. 


In the first case the justificatory chain is infinitely long, in the second case 
it comes to a halt, and in the third case it forms a loop. The sceptic is quick 
to point out that none of these options can be accepted as a justification for 
q. The first option is impossible from a practical point of view, since we are 
ordinary human beings with a restricted lifespan. Moreover, even if we were 
to live forever, continuing to give reason after reason, we would never reach 
the origin of the justification, since by definition the chain does not have an 
origin. The second option is also unsatisfying. For why do we stop at this 
particular reason and not at another? If we can answer this question, we have 
a reason for what we claimed is without a reason, so we actually did not 
stop the chain. And if we cannot answer the question, then stopping at this 
particular reason is arbitrary. The third option is likewise unacceptable, for 


3 Sextus Empiricus, Outlines of scepticism. Book I, 166-169. Translation by Julia 
Annas and Jonathan Barnes, 41. 
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justifying the object under investigation by calling on that very object is not 
particularly convincing. 

The Pyrrhonian takes the moral of this discouraging story to be that we 
are never justified in claiming that we know a proposition q. Proposition q 
might be true, it might be false, we simply have no way to know for sure. 
The only viable option open to us is to suspend judgement. Suspension of 
judgement (epoche) does not imply that we will be paralyzed; it does not 
mean that we cannot form any beliefs, are incapable of making decisions, or 
cannot perform actions on the basis of these decisions. Although we should 
desist from making a truth-claim, it is perfectly acceptable to abide by ap- 
pearances, customs, and natural inclinations, and to act in accordance with 
them. Thus, to return to our snake example, it is altogether acceptable and 
even recommended to take your neighbour’s word for it and proceed corre- 
spondingly — that will actually make you a better, and at any rate a more 
normal person than to engage in highly abstract reasoning. The fact that we 
must take recourse to suspension of judgement should therefore not sadden 
of demoralize us. Quite the contrary. We should welcome this fact and em- 
brace it, since that will free us from the futile and fruitless attempt to arrive 
at knowledge, certainty, or justified beliefs, and bring us closer to ataraxia. 

Pyrrhonian scepticism appears to have been quite a popular philosophical 
outlook in the first century A.D. However interest in it slowly waned in the 
second and third century, and by the fourth the movement had practically 
disappeared. 

About the same time that the Pyrrhonian movement petered out, appre- 
ciation for the ideas of the recently rediscovered Aristotle (384-322 B.C.) 
was on the rise. It turns out that Aristotle had anticipated something like the 
Agrippan Trilemma in his Posterior Analytics and in his Metaphysics. Unlike 
the Pyrrhonians, however, he does not use the trilemma as a means for argu- 
ing that we can never know a proposition. In fact the opposite is true. Rather 
than arguing that none of the three possibilities in Agrippa’s Trilemma pro- 
duces justification, Aristotle gives short shrift to possibilities one and three, 
and claims it to be evident that the second possibility is a proper justificatory 
chain, and so does give us knowledge of some kind, be it practical, theoreti- 
cal, or productive. Here is how Aristotle phrases his position in the Posterior 
Analytics, where ‘understanding’ refers to what we have called ‘knowledge’, 
and where ‘demonstration’ is used for ‘justification’: 


Now some think that because one must understand the primitives there is no 
understanding at all; others that there is, but that there are demonstrations of 
everything. Neither of these views is either true or necessary. 
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For the one party, supposing that one cannot understand in another way, 
claim that we are led back ad infinitum on the ground that we would not un- 
derstand what is posterior because of what is prior if there are no primitives; 
and they argue correctly, for it is impossible to go through infinitely many 
things. And if it comes to a stop and there are principles, they say that these 
are unknowable since there is no demonstration of them, which alone they say 
is understanding; but if one cannot know the primitives, neither can what de- 
pends on them be understood simpliciter or properly, but only on the suspicion 
that they are the case. 

The other party agrees about understanding; for it, they say, occurs only 
through demonstration. But they argue that nothing prevents there being 
demonstration of everything; for it is possible for the demonstration to come 
about in a circle and reciprocally. 

But we say that neither is all understanding demonstrative, but in the case of 
the immediates it is non-demonstrable — and that this is necessary is evident; 
for if it is necessary to understand the things which are prior and on which the 
demonstration depends, and it comes to a stop at some time, it is necessary for 
these immediates to be demonstrable. So as to that we argue thus; and we also 
say that there is not only understanding, but also some principle by which we 
become familiar with the definitions.* 


A similar reasoning can be found in the Metaphysics: 


There are [people who demand] that a reason shall be given for everything; for 
they seek a starting-point, and they wish to get this by demonstration, while it 
is obvious from their actions that they have no conviction. But their mistake 
is what we have stated it to be; they seek a reason for that for which no reason 
can be given; for the starting-point of demonstration is not demonstration.> 


This is not the place, nor do we have the competence to deal with histori- 
cal details or with intricacies of translation from the Greek. Relevant for our 
purpose is the observation that the above passages of Aristotle herald the 
birth of what in contemporary epistemology became known as foundation- 
alism. Foundationalism comes in various shapes and sizes, but its essence is 
an adherence to a foundation, be it a basic belief, a basic proposition, or even 
a basic experience. It thus can be described as joining Aristotle in embrac- 
ing the second option of Agrippa’s trilemma. Like Aristotle, foundationalists 
maintain that justified beliefs come in two kinds: the ones that do, and the 
ones that do not depend for their justification on other justified beliefs. It is 
not always clear what the nature of the latter kind is, but in most versions of 


* Aristotle 1984a, Posterior Analytics, Book I, Chapter 3, 72b 5-24. Translation by 
Jonathan Barnes, 117. 

5 Aristotle 1984c, Metaphysics, Book IV, Chapter 6, 1011a 3-13. Translation by 
W.D. Ross, 1596. 
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foundationalism these justified beliefs are in some sense self-evident and so 
not in need of other beliefs for their justification. 

During the Middle Ages foundationalism became the dominant school of 
thought concerning the structure of justification. Especially Thomas Aquinas 
(1225-1274), whose Aristotelian outlook so greatly influenced Western epis- 
temology, contributed to the view that the Agrippan Trilemma could be re- 
solved by a foundationalist response to the regress problem. In his Com- 
mentary on Aristotle’s Posterior Analytics, Aquinas starts by defending the 
traditional view that knowledge (scientia) of a proposition g implies that one 
has a particular kind of justification for g. The justification for q is either 
inferential or non-inferential. In the first case q is justified by another propo- 
sition, for example A, that is both logically and epistemically prior to q; here 
we know q per demonstrationem, that is through A;. In the second case we 
know q by virtue of itself (per se nota). Aquinas follows Aristotle in arguing 
that inferential justification cannot exist without non-inferential justification. 
We may know many propositions per demonstrationem, but in the end every 
justificatory chain must culminate in a proposition that we know per se. 

The end of the fifteenth century evinced renewed interest in Sextus Empir- 
icus, whose texts were brought to Italy from Byzantium. A Latin translation 
of Sextus’ Outlines, which appeared in 1562 in Paris under the title Pyrrho- 
niarum Hypotyposes, kindled the interest of European humanists, who had 
a taste for using sceptical arguments in their attack not only on astrology 
and other pseudo-science, but also on mediaeval scholasticism and forms of 
all too rigid Aristotelianism.° An important röle in the revival of Pyrrhonian 
scepticism in the sixteenth century was played by the French philosopher 
and essayist Michel de Montaigne (1533-1592). In the manner of Sextus 
and Pyrrho, Montaigne stressed that knowledge cannot be obtained, and that 
we should suspend judgement on all matters. He accordingly propagated tol- 
erance in moral and religious matters, as Pyrrho had done, and espoused an 
undogmatic adoption of customs and social rules. 

Although Montaigne’s work was highly influential at the beginning of the 
seventeenth century, his impact was soon overshadowed by the authority of 
his compatriot René Descartes (1596-1650). This supersession turned out 
to be definitive: when today epistemologists talk about philosophical scep- 
ticism, they generally have Descartes rather than Montaigne or Pyrrho in 
mind. Cartesian scepticism is however quite different from scepticism in the 


6 Thanks to Lodi Nauta for helpful conversations. 
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Pyrrhonian vein.’ Whereas Pyrrhonians cheerfully embrace the adage that 
knowledge cannot be had because information obtained by the senses and by 
reason is unreliable, Descartes aims at no less than a theory of everything, a 
coherent framework that could explain the entire universe. The way in which 
he tried to reach this goal has become part of the canon: in an attempt to 
arrive at a proposition that can resist all doubt, so as to make it the basis on 
which to erect his all encompassing framework, Descartes applies his scepti- 
cal method of doubting every proposition that could possibly be false. Thus 
he arrives at the allegedly indubitable truth of the cogito ergo sum. But of 
course, the adherence to the cogito as the foundation for all our knowledge 
eventually makes him more a foundationalist than a sceptic. In a sense, the 
difference between the two kinds of scepticism could not be greater: whereas 
a Pyrrhonian uses the sceptical method as a means towards ataraxia, the state 
of imperturbability where one is at peace with the supposed fact that knowl- 
edge cannot be had, for Descartes it is a way of acquiring knowledge of the 
entire external world and of our place therein. 


1.2 Coherentism and Infinitism 


Already in the seventeenth century there was severe criticism of the cogito, 
and of the whole Cartesian method of doubt. The foundationalist thrust of 
Descartes’ philosophy, however, was generally accepted, since it harmonized 
perfectly with the dominant tradition in epistemology. Most philosophers be- 
fore Descartes were foundationalists concerning justification, as were many 
after him. The English empiricists of the eighteenth century, John Locke, 
George Berkeley, and David Hume, all had a foundationalist outlook. The 
same can in a sense be said of the great German philosopher of the En- 
lightenment, Immanuel Kant, although he appears to have been a bit more 
cautious. In his Critique of Pure Reason he emphasizes that from the fact 
that every event has a cause, it does not follow that there is a cause for ev- 
erything. Similarly, from the fact that every proposition has a reason, it does 
not follow that there is a reason for the entire justificatory chain.® Yet, says 


7 For a good explanation of the differences between Cartesian and Pyrrhonian scep- 
ticism, see Williams 2010. 

8 The difference is nowadays known as one of the scope distinctions. The statement 
‘For each y there is an x to which y stands in the relation R’ (Vy 3x yRx) differs 
from “There is an x to which each y stands in the relation R’ (Ax Vy yRx). Standard 
example: ‘Every mammal has a mother’ differs from “There is something that is the 
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Kant, humans have a natural inclination to posit such a foundational cause 
or reason, and Kant’s text does not always make it very clear whether this 
inclination should be resisted or put to practical use. 

In the nineteenth century, Hegel developed an anti-foundationalist epis- 
temology, as did Nietzsche, but it was not until the twentieth century that 
a serious alternative to foundationalism surfaced in the form of coherentism 
(although major figures in the twentieth century like Bertrand Russell, Alfred 
Ayer, and Rudolf Carnap remained convinced foundationalists). The main 
motivation behind the rise of coherentism was dissatisfaction with the foun- 
dationalist approach, especially with the idea that basic beliefs are somehow 
self-justifying and could exist autonomously. “No sentence enjoys the noli 
me tangere which Carnap ordains for the protocol sentences”, writes Otto 
Neurath in 1933 about Carnap’s attempt to logically re-erect the world from 
a bedrock of basic elements or protocol sentences, as he calls them.” Ac- 
cording to Neurath and other coherentists, sentences are always compared to 
other sentences, not to experiences or ‘the world’ or to sentences that have 
some sort of sovereign standing.!° 

Coherentism is described in many textbooks as the attempt to put an 
end to the regress problem by embracing the third alternative of Agrippa’s 
Trilemma. For example, A» can be a reason for A4, which is a reason for q, 
which in turn is a reason for A2. The position is however markedly more so- 
phisticated: rather than advocating reasoning in a circle, it maintains that jus- 
tification is not confined to a finite or ring-shaped justificatory chain. What 
is justified, according to coherentists, are first and foremost entire systems 
of beliefs or propositions, not individual elements in these systems. Justifi- 
cation of individual beliefs through one-dimensional justificatory loops is a 
special case only, a degenerate form of the holistic process that constitutes 
justification. 

According to coherentism, the more coherent a system is, the more it is 
justified. But what exactly does it mean to say that beliefs in a system cohere 
with one another in that system? Twentieth century coherentists have worked 
hard to find a satisfying definition of ‘coherence’, but Laurence Bonjour has 
argued that there is no simple answer to the question, since coherence de- 


mother of all mammals’. The difference was already acknowledged in the Middle 
Ages and perhaps even by Aristotle, but has not always been applied consistently 
across the board. 

? Neurath 1932-1933, 203. See also Carnap 1928. 

10 In the telling words of Donald Davidson: “what distinguishes a coherence theory 
is simply the claim that nothing can count as a reason for holding a belief except 
another belief” (Davidson 1986, 310). 
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pends on many different conditions being fulfilled; in fact, an entire list of 
different coherence criteria can be made.!! A complicating factor in finding 
a definition of coherence is that we want this definition also to incorporate 
a measure, so that we can determine how coherent a particular system is. 
Many ingenious suggestions for formal coherence measures have been put 
forward.'* All these measures are vulnerable to a classic criticism, namely 
that coherence is not truth-conducive: a system of propositions can be co- 
herent to the highest degree while all of the propositions are in fact false. 
The criticism was already ventilated by Bertrand Russell at the beginning 
of the twentieth century and is sometimes referred to as the Bishop Stubbs 
objection: 


Whatever the standards of coherence may be, it seems likely that alternative 
sets of propositions will meet them: as Russell 1906 pointed out, although 
the highly respectable Bishop Stubbs died in his bed, the proposition “Bishop 
Stubbs was hanged for murder” can readily be conjoined with a whole group 
of others to form a set which passes any plausible coherence test; and indeed, 
the same can be said of the propositions that make up any good work of real- 
istic fiction." 


In fact the Bishop Stubbs objection to coherentism cuts even deeper than 
Russell envisaged. As Luc Bovens and Stephan Hartmann showed in 2003, 
a system which is more coherent than another system cannot even be said to 
have a higher probability of being true than the other system. !4 

At the beginning of the twenty-first century a third approach to the episte- 
mological regress problem entered the philosophical arena, one that is now 
known as ‘infinitism’. While foundationalism and coherentism are said to 
avoid the regress problem by opting for the second, respectively the third, 
possibility of Agrippa’s Trilemma, infinitism chooses the first. According 
to infinitists, it is not prima facie absurd that the process of giving reasons 
for reasons might go on without end, so that the justificatory chain will be 
infinitely long. 


11 Bonjour 1985, 97-99. 

12 See for example Olsson 2001, 2002, 2005a, 2005b; Shogenji 1999. For the rela- 
tion between coherence and confirmation, see Fitelson 2003; Dietrich and Moretti 
2005; Moretti 2007. For defences of coherentism in general, see Quine and Ullian 
1970; Rescher 1973; Bonjour 1985; Davidson 1986; Lehrer 1997. 

13 Walker 1997, 310. Although in this quotation Walker refers to propositions, a 
similar objection, albeit one that is somewhat more complicated, could be made 
with reference to beliefs. Ibid. 316. See for Russell’s argument, Russell 1906. 

14 Bovens and Hartmann 2003. See also Olsson 2005b. 
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To say that infinitism has consistently had a bad press would be claim- 
ing too much. Infinitism had no press at all, since until very recently nobody 
took it seriously. The reason for this is not difficult to discern. In an epistemo- 
logical tradition dominated by Aristotelian and Cartesian foundationalism, a 
position like infinitism is highly counterintuitive to say the least; for how 
could anybody, in Aristotle’s words, “go through infinitely many things”? It 
is therefore not surprising that infinitism is hardly, if ever, mentioned in trea- 
tises or textbooks; and if it is mentioned, then it usually serves as an example 
of a blatantly ridiculous way to go. Yet it cannot be denied that infinitism sits 
well with some modern ideas about the nature of knowledge, such as that 
knowledge is essentially fallible, and that the human search for it is, indeed, 
without end. Despite many attempts to show the contrary, it is not at all clear 
how these ideas, which so many of us endorse, can be smoothly combined 
with foundationalism or even coherentism.!> 

In this book we will investigate the consequences of an infinitist response 
to the regress problem. We do not propose to defend infinitism as such. 
Rather our aim is twofold. On the one hand, we intend to show that some 
standard objections to the position are not as strong as they might seem at 
first sight. On the other hand, we explain how our analysis of these objec- 
tions brings about insights that cast new light on the traditional positions, 
foundationalism and coherentism; as we will see, a careful analysis of infi- 
nite justificatory chains will teach us interesting novel facts about finite ones. 
In the end we somehow try to get it all, sketching the contours of an infinitist 
version of coherentism, which also acknowledges the foundationalist lesson 
that we should somehow make contact with the world. We will return to this 
in the final chapter. 

All-important for the development of infinitism was the work by Peter 
Klein. Around 2000 Klein wrote a number of papers in which he took the bull 
by the horns and presented infinitism as a genuine competitor to coherentism 
and foundationalism. Here is how Klein introduces his view in a relatively 
early paper: 


The purpose of this paper is to ask you to consider an account of justification 
that has largely been ignored in epistemology. When it has been considered, it 
has usually been dismissed as so obviously wrong that arguments against it are 
not necessary. The view that I ask you to consider can be called “Infinitism”. 
Its central thesis is that the structure of justificatory reasons is infinite and 
nonrepeating. My primary reason for recommending infinitism is that it can 


!5 For a prominent attempt at reconciling foundationalism and fallibilism, see Audi 
1998. 
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provide an acceptable account of rational beliefs, i.e. beliefs held on the ba- 
sis of adequate reasons, while the two alternative views, foundationalism and 
coherentism, cannot provide such an account.”!® 


Klein is a convinced advocate of infinitism. As he sees it, infinitism is not 
just a third way to solve the regress problem beside two other approaches 
— it is the only viable solution to the regress problem.!’ The reason is that 
infinitism is the only account that can satisfy “two intuitively plausible con- 
straints on good reasoning” which jointly entail that the justificatory chain is 
infinite and non-repeating.!® The two constraints are the Principle of Avoid- 
ing Circularity (PAC) and the Principle of Avoiding Arbitrariness (PAA). 
Here is Klein about the first constraint: 


PAC: For all q, if a person, S, has a justification for q, then for all Aj, if A; is 
in the evidential ancestry of q for S, then q is not in the evidential ancestry of 
A; for S.” 


By the term ‘evidential ancestry’ Klein refers to the order of the links in the 
justificatory chain for q. So in our justificatory chain (1.1), proposition A3 is 
in the evidential ancestry of A; and q, and Az is in the evidential ancestry of 
A2, A; and q. Klein considers PAC to be “readily understandable and requires 
no discussion”, and hence refrains from further defending it.”° 


16 Klein 1999, 297. The term ‘infinitism’ was however not coined by Klein. He gives 
the credits for inventing the term to Paul Moser, who uses “epistemic infinitism” to 
refer to “inferential justification via infinite justificatory regresses” (Moser 1984, 
199). See Klein 1998, 919, footnote 1. Charles Sanders Peirce is often paraded as 
the first infinitist (Peirce 1868), but James Van Cleve has suggested that what Peirce 
actually defends is “the possibility that each cognition of an object be ‘determined’ 
by an earlier cognition’, not the possibility of an infinite regress of justification (Van 
Cleve 1992, 357, footnote 29). 

17 “I conclude that neither foundationalism nor coherentism provides an adequate 
non-skeptical response to the epistemic regress problem. Only infinitism does.” 
(Klein 2011a, 255); “...only infinitism is left as a possible solution on offer to 
the regress problem” (Klein 2007, 16). In his later work, however, Klein made a 
plea for a “rapprochement” between foundationalism and infinitism by arguing that 
basic beliefs are contextual: whether a particular belief is basic or not depends on 
the context (Klein 2014). John Turri also made an attempt to bring foundational- 
ism and infinitism together by presenting an exampe of a justificatory chain which, 
although infinite, can nevertheless be handled by foundationalists (Turri 2009, 161- 
163). For criticism of this example, see Peijnenburg and Atkinson 2011, Section 6, 
and Rescorla 2014, 181-182. 

18 Klein 1999, 298. 

19 Thid., 298-299. For convenience we have adjusted Klein’s notation. 

20 Klein 2005, 136. 
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The Principle of Avoiding Arbitrariness is: 


PAA: For all q, if a person, S, has a justification for q, then there is some 
reason, Aj, available to S for g; and there is some reason, A2, available to S$ 
for Aj; etc. 


In contrast to the first constraint, PAA is likely to generate a lot of discussion. 
For what does it mean to say that a proposition A, is available to S as a 
reason for A„_1? The answer to this question is clearly very important, for it 
involves what we mean by ‘epistemic justification’, and thus what we mean 
by the arrow in our justificatory chain: 


q <— A, +— Aa +— Az +— Ag... 


Although Klein acknowledges the importance of the question, he believes 
that the discussion about the pros and cons of infinitism can be carried out 
without delving into the matter. He argues that A, is available to S as a rea- 
son for A„-ı if and only if A, is both objectively and subjectively available. 
Objective availability is about the relation between two propositions: A, is 
objectively available as a reason for A,_; if and only if it really is a rea- 
son for A,_;. Klein remarks that what makes a proposition a reason “need 
not be fleshed out”, since “there are many alternative accounts that could 
be employed by the infinitist”; hence the “thorny issue” of what makes a 
proposition a reason “can be set aside”.?! Subjective availability is about the 
relation between a proposition and a person: A, is subjectively available as 
a reason to S if and only if A, is “appropriately ‘hooked up’ to S’s beliefs 
and other mental contents”.*? It need not imply that S actually believes or 
endorses A,„; it only means that S must in some sense be able to “call on” 
An.” For example, it is not necessary for S to know or believe that 366 + 71 
= 437 in the sense in which S knows or believes that 2 + 2 = 4. It is enough 
for subjective availability if S is able to do the calculation when called on 
to do so. In Klein’s words, “The proposition that 366 + 71 = 437 is subjec- 
tively available to me because it is correctly hooked up to already formed 
beliefs.”** 

Unlike Klein, we do not believe that an investigation into the viability of 
infinitism can evade the question as to what makes a proposition a reason for 


2! Ibid., 136-137. 

22 Ibid., 136. 

23 Klein 1999, 300, 308-309. 

24 Thid., 308. Coos Engelsma has argued that Klein’s distinction between objectively 
and subjectively available can be variously interpreted (Engelsma 2015, Engelsma 
2014). 
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another proposition. Thorny as the issue may be, the meaning of ‘justifica- 
tion’ cannot be set aside if we want to examine whether chains of justification 
must be finite or can be infinite. Klein is right that there exist many different 
accounts of epistemic justification, but it is not so that all these accounts can 
be used without problem. Some of the accounts will be useful to infinitists, 
while others might be more advantageous to foundationalists or coheren- 
tists. It is therefore important to have an account of justification, however 
provisional it may be, on which everybody agrees, and then see whether this 
account allows infinite justificatory chains — and if so, in what sense. 

William Alston has argued that such a neutral account of justification is 
impossible.” In his view, no definition of justification can serve as an im- 
partial starting point or as a tool for adjudicating epistemological debates. 
Every definition will eventually take sides, and favour a particular position 
in the epistemological debate about the structure of justfication. Alston’s ad- 
vice to the epistemological community therefore is to abstain from attempts 
at defining justification and instead turn to spelling out what he calls ‘epis- 
temic desiderata’. That will be more fruitful for the theory of knowledge than 
undertaking ill-fated attempts to find a definition of justification. 

Alston’s point is well taken, but we think it applies primarily to material 
accounts of justification, less so to formal ones. As we will argue in Chapter 
2, focusing on the formal properties of epistemic justification might generate 
more consensus than Alston deems possible. Moreover, as we will show in 
Chapters 3 to 6, a focus on formal properties casts doubt on several objec- 
tions to the idea that justificatory chains can be infinitely long. In the end, our 
formal explication of justification provides us with means to preserve many, 
although not all, of Peter Klein’s intuitions about the value of infinitism. 


1.3 Vicious Versus Innocuous Regress 


Epistemology is of course not the only place where infinite regresses occur. 
They can also be found in other philosophical disciplines, as well as in areas 
outside philosophy. Many of these regresses are not troublesome at all. Espe- 
cially mathematics abounds with regresses that are benign: every integer has 
both a successor and a predecessor, every line segment can be divided into 
two, every natural number can be doubled, and so on. Outside mathematics 
there are benign regresses too, such as the regress arising from the statement 


25 Alston 1989, 1993, 2005a. 
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that, in arriving atthe Louvre, I had already reached the midpoint of the dis- 
tance to the Louvre, and the midpoint of the distance to the first midpoint, 
and so on. 

How do we distinguish between between vicious and harmless regresses? 
This is an intriguing question, but an attempt to answer it might be overly 
ambitious. As Daniel Nolan has argued, it will be difficult if not impossi- 
ble to find a general answer: there simply is not one criterion that applies 
to all cases.”° A more feasible plan, although still not an easy one, is to ask 
ourselves why exactly it is that justificatory regresses are widely perceived 
as vicious. Why are infinite justificatory chains readily consigned to the bad 
batch? The fact that they have been treated with hostility or neglect goes al- 
most without saying. “It can hardly be pretended”, writes David Armstrong, 
“that this reaction to the regress [i.e. calling it virtuous] has much plausibil- 
ity. ...it is a desperate solution, to be considered only if all others are clearly 
seen to be unsatisfactory”.?’ Here are a few more quotations that serve as il- 
lustrations. All are taken from epistemology textbooks which were published 
after Peter Klein launched his controversial view, for earlier books are often 
simply silent about the possibility. 


We humans, for better or worse, do not have an infinite amount of time. 
... Evidently, then, proponents of infinitism have some difficult explaining to 


26 Nolan 2001. The same point is made by Nicholas Rescher (2010): “There is noth- 
ing vicious about regresses as such” (ibid., 21); “Infinite regression is not something 
that is absurd as such, involving by its very nature a fault or failing that can be con- 
demned across the board. Its viciousness will depend on the specifics of the case.” 
(ibid., 62). Even so, Rescher offers several rules of thumb for distinguishing a be- 
nign from a vicious regress. One of them involves the difference between regresses 
that are time-compressible and those that are not: the former are often harmless, but 
the latter may well be vicious: “any regress that requires the realization of an in- 
finitude of [not time-compressible] actions is thereby vicious” (ibid., 53). A related 
distinction is that between consequences or co-conditions on the one hand and pre- 
conditions or pre-requisites on the other hand (ibid., 55-61). The former are time- 
compressible, the latter are not, so a regress with consequences or co-conditions will 
often be harmless while a regress with pre-conditions or pre-requisites will mostly 
be vicious. We briefly return to time-compressibility in Chapter 5. 

Michael Huemer has made the interesting suggestion that an infinite regress is 
vicious (i.e. cannot exist) if it requires the instantiation of “an infinite intensive 
magnitude” (Huemer 2014, 88). He considers this suggestion to be a first step to- 
wards “a new theory of the vicious infinite” (ibid., 95). On evaluating infinite regress 
arguments in general, see Gratton 2009, which is a study in argumentation theory; 
Wieland 2014 also deals with the subject. 

27 Armstrong 1973, 155. 
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do. As a result, infinitism has attracted very few public supporters throughout 
the history of epistemology. It is, nonetheless, a logically possible approach 
to the regress problem, at least according to some philosophers.”® 


The least plausible . . . response to Agrippa’s trilemma involves . . . holding that 
an infinite chain of justification can justify a belief. The position is known as 
infinitism. On the face of it, the view is unsustainable because it is unclear 
how an infinite chain of grounds could ever justify a belief any more than an 
infinite series of foundations could ever support a house. Nevertheless, this 
view does have some defenders ...”? 


[Infinitism] tells us that evidential chains can be infinitely long, and so need 
not terminate. [It] allows that [a belief] can be supported by an evidential 
chain that has an infinite number of links ...Such an infinite chain would 
have no final or terminating link. One difficulty with this option is that it 
seems psychologically impossible for us to have an infinite number of beliefs. 
If it is psychologically impossible to have an infinite number of beliefs, then 
none of our beliefs can be supported by an infinite evidential chain.*° 


For one thing, justifications that never come to an end are not the sort of 
justifications we typically prize from the standpoint of learning more about 
the world. For another, [infinitism] seemingly would commit us to the idea 
that humans have an infinite chain of beliefs. ... Although the normal person 
undoubtedly has an indefinitey large number of beliefs, that person is unlikely 
to have a limitless supply of beliefs.*! 


Note that three of the four cited authors criticize infinitism because it sup- 
posedly implies that people have an infinite number of beliefs. The complaint 
dates back as far as Aristotle, and is known as the finite mind objection. We 
discuss this objection in Chapter 5. For the moment we restrict ourselves to 
observing that the intuition behind the finite mind objection is not so natural 
and widely shared as it may seem at first sight. Even among philosophers 
opposed to infinitism, there are some who do believe that people can have 
an infinite number of beliefs. Richard Fumerton, for example, writes in his 
paper on classical foundationalism: 


28 Moser, Mulder, and Trout 1998, 82. 
29 Pritchard 2006, 36. 

30 Lemos 2007, 48. 

31 Crumley 2009, 109-110. 
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Klein is right that we do have an infinite number of beliefs, but I think he 
misses the real point of the regress argument for noninferentially justified be- 
liefs. The viciousness of the regress is, I believe, conceptual.>? 


Do we have a finite mind? This is not so clear. We have finite brains, and 
minds supervene on brains, but does that mean that our mind is finite? What 
exactly does it mean to have a finite mind? That we cannot have an infinite 
number of beliefs? But how to count? Moreover, even if we have a finite 
mind in the sense that our beliefs are finite and therefore countable, this does 
not prevent us from saying many cogent things about infinities — how is that 
possible? 

The routine manner in which epistemologists have rejected infinite justi- 
ficatory chains is reminiscent of the customary ways in which infinite causal 
chains have been cast aside. Again, Aristotle appears to have played a major 
role here. His familiar arguments against infinite causal chains in his Physics 
and Metaphysics became a well-entrenched part of the philosophical canon. 
Yet Aquinas and other mediaeval scholars had already pointed out that Aris- 
totle’s arguments may be more restricted than they appear: not every causal 
regress seems to be vicious, it all depends on what is meant by ‘causal con- 
nection’. So let us take a closer look at Aristotle’s objection to causal re- 
gresses and the criticism thereof by the mediaeval schoolmen. This might 
help us to see why exactly it is that justificatory regresses have been rejected 
without much ado, and to assess whether such a hasty rejection is appropri- 
ate. In Chapter 8, in the final section, we will discuss causal chains in a more 
modern setting, namely that of causal graphs. 

Aristotle’s main argument against a causal regress is that it purports to 
explain a phenomenon, but in fact fails to do so. Suppose an event, an object, 
or a process A is explained by saying that it is caused by B, and B is causally 
explained by pointing to C, and so on. If this series were to go on indefi- 
nitely, it would remain unclear why A occurred in the first place. The only 
way to explain the occurrence of A is to refer to a principal or first cause, 
i.e. something that causes all the other elements in the series, but is itself 
uncaused. Aristotle stresses that his argument is not confined to a particu- 
lar kind of causation, but applies to any of the four different causes that he 
distinguishes, i.e. material, efficient, final or formal: 


Evidently there is a first principle, and the causes of things are neither an infi- 
nite series nor infinitely various in kind. For, on the one hand, one thing cannot 
proceed from another, as from matter, ad infinitum ...nor on the other hand 


32 Fumerton 2001, 7. We will say more about the conceptual objections to infinitism 
in Chapter 6. 


18 1 The Regress Problem 


can the efficient causes form an endless series ...Similarly the final causes 
cannot go on ad infinitum. ... And the case of the formal cause is similar. ... It 
makes no difference whether there is one intermediate or more, nor whether 
they are infinite or finite in number. But of series which are infinite in this 
way, and of the infinite in general, all the parts down to that now present are 
like intermediates; so that if there is no first there is no cause at all.>> 


Aristotle’s argument is the most intuitive when he talks about causation as 
setting something in motion. Suppose object A moves because it is moved by 
object B, and B moves because it is moved by C, and so on. Then, unless the 
series comes to rest in an Unmoved Mover, we cannot explain why A moved 
in the first place: 


Now this [a thing being in motion] may come about in either of two ways, 
either .. . because of something else which moves the mover, or because of the 
mover itself. Further, in the latter case, either the mover immediately precedes 
the last thing in the series, or there may be one or more immediate links: e.g. 
the stick moves the stone and is moved by the hand, which again is moved 
by the man; in the man, however, we have reached a mover that is not so in 
virtue of being moved by something else. Now we say that the thing is moved 
both by the last and by the first of the movers, but more strictly by the first, 
since the first moves the last, whereas the last does not move the first, and the 
first will move the thing without the last, but the last will not move it without 
the first: e.g. the stick will not move anything unless it is itself moved by the 
man. If then everything that is in motion must be moved by something, and 
by something either moved by something else or not, and in the former case 
there must be some first mover that is not itself moved by anything else, while 
in the case of the first mover being of this kind there is no need of another 
(for it is impossible that there should be an infinite series of movers, each of 
which is itself moved by something else, since in an infinite series there is no 
first term) — if then everything that is in motion is moved by something, and 
the first mover is moved not by anything else, it must be moved by itself.*4 


In other words, if a man moves a stone by moving a stick, the movement of 
the stone is not explained by referring merely to the movement of the stick. 
We must point to the man who moves the stick, for without him the stick 
would be at rest. The man’s own movement, however, cannot be explained 
in this manner, since the man is not moved by anybody or anything outside 
him — he moves himself. 


33 Aristotle 1984c, Metaphysics, Book II, Chapter 2, 994a, 1-19. Translation by 
W.D. Ross, 1570. 

34 Aristotle 1984b, Physics, Book VIII, Chapter 5, 256a, 4-21. Translation by R.P. 
Hardie and R.K. Gaye, 427-428. 
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Thomas Aquinas pointed out that Aristotle’s picture of a causal regress 
appears to be too simple. There are at least two different causal regresses, 
each of them covering Aristotle’s four causes, one being vicious and one be- 
ing benign. Aquinas and other scholastics refer to the distinction as a causal 
series per se versus a causal series per accidens. The difference should not 
be confused with the distinction we mentioned in Section 1.1 between know- 
ing a proposition per se and knowing it per demonstrationem. Nor should it 
be simply put on a par with the distinction between necessary and acciden- 
tal properties. Causal series per accidens and per se are about the ways in 
which its members are ordered, i.e. the way in which the causes in the series 
are linked. A particular cause can have necessary properties but be linked to 
other causes in an accidental way. Conversely, a cause may have accidental 
properties, but be part of a series of which the members are ordered in an 
essential way. 

In a causal series per se each intermediate member (that is each member 
except the first and the last) exerts causal power on its successor by virtue of 
the causal power exerted on this member by its predecessor. Aristotle’s stone- 
stick-man example in the above citation involves such an essential ordering 
of causes. The stick causes the stone to move by virtue of the fact that the 
man causes the stick to move. This series consists of three elements, of which 
only the second (the stick) exerts causal power on its successor (the stone) 
by virtue of the causal power exerted on it by its predecessor (the man). Of 
course there will be more intermediate members if the essential ordering is 
longer. If for example the stone were to move a pebble, the stone would cause 
the pebble to move by virtue of the fact that it was moved by the stick. The 
salient point is that the intermediate members depend for their causing on 
their being caused. 

Things are different in a causal series per accidens. Here each member 
(except the last) exerts power on its successor, but not by virtue of the causal 
power exerted on it by its predecessor. The standard example is Jacob, who 
was begotten by Isaac, who in turn was begotten by Abraham. Again we 
have a series of three elements, but none of them, not even the second one, 
causes by virtue of the fact that it is caused. Isaac fathers Jacob not because 
of the fact that he was fathered by Abraham, but because of having had inter- 
course with Rebecca. A stick needs a hand to move the stone, but Isaac does 
not need Abraham to sleep with Rebecca. Of course, Isaac needs Abraham 
for his existence: if Abraham had not existed, then Isaac would not have ex- 
isted either. But neither Abraham nor Abraham’s intercourse with Sarah is 
the cause of Isaac begetting Jacob. As Patterson Brown formulates it in his 
outstanding paper on infinite causal regressions: 
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Abraham’s copulation causes Isaac’s conception, Isaac’s copulation causes Ja- 
cob’s conception, Jacob’s copulation causes Joseph’s conception. Each mem- 
ber has one attribute qua effect (being conceived) and quite another attribute 
qua cause (copulating).*° 


In an essential ordering of causes, on the other hand, the attributes qua effect 
and qua cause coincide: 


...it is the same function of the stick (namely, its locomotion) which both is 
caused by the movement of the hand and causes the movement of the stone. 
Again, a series where the fire heats the pot and the pot in turn heats the stew, 
causing it to boil, is also essentially ordered; for the warmth of the pot is both 
caused by the warmth of the fire and cause of the warmth of the stew, while 
the warmth of the stew is both caused by the warmth of the pot and cause of 
the stew’s boiling.*© 


The above examples suggest that the causal relation in an essentially ordered 
series is transitive, whereas the causal relation in an accidentally ordered 
series is intransitive.*’ If the man moves the stick, and the stick moves the 
stone, then the man moves the stone. But if Abraham begets Isaac, and Isaac 
begets Jacob, then it is not the case that Abraham begets Jacob. 

The scholastics all agree that an essential ordering of causes needs a first 
member, whereas an accidental ordering does not. Consider again the case 
where we explain the moving of object A by pointing to B. The idea here is 
that we have not really explained the movement of A if B is moved by C; at 
best we have only postponed the explanation of A’s movement, or better: we 
have now dressed it up as the question of how to explain B’s movement. Un- 
less we arrive at a first mover X, embodying the origin of the movement, the 
cry for an explanation will not be deadened and the explanation of A’s move- 
ment will be woefully incomplete.’ The situation is entirely different in an 
accidental ordering of causes. If we explain Jacob’s conception by referring 


35 Brown 1966, 517. 

36 Thid. 

37 Ibid. R.G. Wengert tried to formalize the transitivity of essentially ordered causes 
by means of Gottlob’s Frege’s ancestral relation (Wengert 1971). 

38 C.J.F. Williams argued that Thomas Aquinas in his Summa Theologiae commits 
a petitio principii: by assuming that the only ‘movers’ are either first or second 
movers, Thomas excludes by fiat the possibility that an infinite sequence may be 
doing the moving (Williams 1960). J. Owens doubts whether Williams’ critique 
“come[s] to grips with the argument of Aquinas in the argument’s own medieval 
setting”, but he grants the point “as it stands from any concrete background and 
time” (Owens 1962, 244). 
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to the fact that Isaac made love to Rebecca, we have given a full and satisfac- 
tory explanation. Of course, we could go further and ask for an explanation 
of Isaac’s lovemaking. But such an explanation, as Patterson Brown deftly 
notes, “will center on his actions with Rebecca, rather than on his having 
been sired by Abraham.”’? Therefore, according to Thomistic schoolmen, 
a causal regress per se is vicious because an essential causal ordering needs 
a first member; but a causal regress per accidens is harmless since an acci- 
dental ordering can exist without a member that is the first. Aristotle, when 
talking about causality, seems however to have had in mind solely causal 
orderings and regresses per se. 

Still, it is not at all easy to find out how exactly an essential causal order- 
ing differs from an accidental one. Is it because the former is transitive and 
the latter intransitive? That seems unlikely, for one can think of accidental 
causal orderings that are transitive. For example, Abraham is an ancestor of 
Isaac, and Isaac of Jacob, but Abraham is also an ancestor of Jacob. This or- 
dering is transitive, but it is not essential: it is not the case that Isaac’s being 
an ancestor is caused by Abraham’s being an ancestor or that it causes Ja- 
cob’s being an ancestor. So while it is true that the Abraham-begetting-Isaac 
example is intransitive, the intransitivity might be a feature of the example, 
not of the fact that it illustrates an accidental causal ordering. Conversely, as 
Brown notes, the relation ‘A is moved by B’ need not always be used in a 
transitive manner.” 

Another difficulty, not less serious, concerns the question why exactly the 
mediaeval schoolmen thought that an essential causal regress is vicious and 
an accidental causal regress is harmless. Why is it that a causal ordering per 
se needs a first member and a causal ordering per accidens does not? Brown 
discusses the possibility that it is simultaneity that does the trick. The idea is 
that, because causes in an essential ordering occur simultaneously (the man, 
the stick, and the stone all moving at the same time), it is impossible to have 
an infinite number of causes. For were we to allow an infinity of causes all 
happening instantaneously, we would defy Aristotle’s ban on actual infini- 
ties, and no true Aristotelian would ever go that far. In an accidental causal 
series, however, the causes are ordered chronologically and thus do not oc- 
cur at the same time; if they were to be infinite in number, they would form 
a potential, not an actual infinity. However, Brown argues that it is not the 
supposed simultaneity which requires that an essentially ordered series has 
a first term. His argument is strong: Aristotle and his followers themselves 


39 Brown 1966, 523. 
40 Thid., 518. 
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explicitly deny that the argument for a first cause is related to the question 
whether an infinite number of concurrent intermediate causes is possible or 
not.* 

What does the above excursion concerning causal regresses teach us about 
justificatory regresses? We have said that the hostility towards justificatory 
regresses, early and late, parallels a hostility towards causal regresses, es- 
pecially in Aristotle’s work. However, we have seen that it makes sense to 
distinguish between two different causal regresses (even if the distinction is 
not always crystal clear and even if it is unclear whether the dichotomy is 
exhaustive). Thus the question arises whether the same goes for justficatory 
regresses. Can they be divided in a similar dichotomy? Is a typical justifi- 
catory regress more like a causal regress per se or is it more like a regress 
per accidens? Does it resemble the vicious man-stick-stone example or is 
it similar to the benign Abraham-begets-Isaac paradigm? With respect to 
all these questions, the jury is still out. Some philosophers apparently have 
the intuition that justificatory regresses mirror the man-stick-stone example, 
transitivity and all: 


Consider a train of infinite length, in which each carriage moves because the 
one in front of it moves. Even supposing that that fact is an adequate expla- 
nation for the motion of each carriage, one is tempted to say, in the absence 
of a locomotive, that one still has no explanation for the motion of the whole. 
And that metaphor might aptly be transferred to the case of justification in 
general.*? 


Others however hold that in justificatory regresses transitivity fails: 


[regressive transitivity] will often fail — for example in the much-discussed 
regress of reasons. For ...A2 can afford a good reason for A1’s acceptance, 
and A, for q’s, without Az being a good reason to accept q." 


To complicate the matter still further, contemporary epistemologists dis- 
cussing infinitism generated their own paradigm cases. One involves the 
analogy with basketball players throwing around the ball: 


41 Thid., 520. Brown hypothesizes that the concept of responsibility has something 
to do with it. Calling in mind the etymology of ‘cause’ (which goes back to the 
Greek ‘aitia’, a term that occurs mainly in legal contexts), Brown argues that it is 
precisely the connotation of ‘cause’ as something that is responsible for its effect 
that is crucial here: an essentially ordered series needs a first member because it 
needs a member that is responsible for the entire series. 

42 Hankinson 1995, 189. 

43 Rescher 2010, 83, footnote 1. We have changed the symbols so as to make them 
match ours. 
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Consider the analogy of basketball players ... passing the ball to another. 
... the question is this: how did [the ball] get there in the first place? “4 


Here epistemic justification which goes from one proposition to another is 
compared to a ball that is passed from one player to another. Is this a helpful 
picture? Not so sure: the picture suggests that justification is something that 
is lost once it is handed over to the neighbouring proposition, and this is 
not something that we associate with justification. We do not believe that, 
if A; justifies A;, the former thereby loses the property of being justified 
— quite the opposite. In this respect justification seems more like dharma 
transmission or like an infectious disease: holy man A; can impart dharma to 
person A; without losing his holiness, just as the sick person A; can pass on 
his infection to person A ; without thereby being cured. = 

In logic and mathematics, a necessary condition for establishing whether 
a series continues indefinitely is to know the domain and the relation in ques- 
tion.4° Take the formula VxdyRxy. Whether this formula is true or false de- 
pends on the domain over which the variables x and y range and on the nature 
of the relation R. That is, we need to know what the objects in the series are 
and also what the relation between those objects is. The statement S: ‘For all 
objects x there is an object y such that y is smaller (or less) than x’ is true if x 
and y are integers; S then unproblematically covers an infinitude of objects. 
But if x and y are natural numbers, then S is false, since there is a smallest 
natural number. However, if we change S into S’: ‘For all objects x there is 
an object y such that y is greater than x’, then we obtain a truth even with the 
interpretation of x and y as natural numbers. This illustrates that not only the 
character of the objects is important, but the nature of the relation between 
the objects too. 

As do the causal cases, these mathematical considerations intimate that, 
also in the field of epistemic justification, we must at least make clear what 
the meaning is of the A, the objects, and of <—, the arrow which symbolizes 
the relation between the objects. What are reasons in a justificatory chain? 
And how are they related? Only after having settled these questions could 


“4 Klein 2011b, 494. 

45 John Turri also noted that ‘justification’ does not imply that something gets lost 
(Turri 2014, 222). However, he uses the word ‘transmission’ for the latter case. In 
Turri’s terminology, if a property gets transmitted from A; to A;, this means that A; 
loses the property while A; receives it. Our use of ‘transmission’ is different, in that 
it does not imply that A; no longer has the property. 

46 Cf. Beth 1959, Chapter 1, Section 4. 
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we hope to assess whether a justificatory chain of infinite length is sensible 
or nonsensical, and we will address these matters in the next chapter. 
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Chapter 2 
Epistemic Justification 


Abstract 

What is the nature of the justifier and of the justified, and how are they re- 
lated? The answers to these questions depend on whether one embraces inter- 
nalism or externalism. As far as the formal side of the justification relation 
is concerned, however, the difference between internalism and externalism 
seems irrelevant. Roughly, there are three proposals for the formal relation. 
One of them conceives the justification relation as probabilistic support; in 
fact, however, probabilistic support is only a necessary and not a sufficient 
condition for justification. 


2.1 Making a Concept Clear 


In philosophy concepts typically resist definition. Truth, justice, beauty, free- 
dom, goodness: each of these notions is as fundamental as it is enigmatic. It 
has been argued that the perennial attempts to define these terms are part 
and parcel of the philosophical game, setting philosophy apart from science. 
Thus Kant maintained that the way in which philosophers define their con- 
cepts differs essentially from the way in which definitions are given outside 
of philosophy. Defining a term in mathematics or the sciences, as he writes 
in his Logic, is “to make a clear concept” whereas defining a philosophical 
concept is “to make a concept clear”.! Giving a definition of the mathemati- 
cal term ‘trapezium’, for example, amounts to combining previously existing 
and supposedly unambiguous notions like ‘parallel’, ‘angle’, and ‘side’ into 


! Einen deutlichen Begriff machen versus einen Begriff deutlich machen — Kant, 
Logik, Einleitung, VIII C, see Jasche 1869/1800, 70. 
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the newly constructed concept ‘trapezium’. In defining a philosophical term, 
however, we do not construct a new term out of already existing elements, but 
rather try to reconstruct and clarify what is already given to us in a confused 
and ill-determined way. While in philosophy we have a vague understanding 
of the definiendum and strive to come up with a definiens that agrees with 
what we have in mind, in the sciences we fabricate a definiendum on the 
basis of a clear and existing definiens. 

Although this Kantian view of philosophy dates back to Plato and was still 
upheld in the twentieth century by such major figures as Rudolf Carnap, it is 
all but uncontroversial. Especially in the late twentieth century pragmatists 
and naturalists blamed it for its alleged sterility, and for its failure to appreci- 
ate the continuity between science and philosophy. Richard Rorty expresses 
the point unreservedly: 


Pragmatists think that the story of attempts ...to define the word ‘true’ or 
‘good’ supports their suspicion that there is no interesting work to be done in 
this area. It might, of course, have turned out otherwise. People have, oddly 
enough, found something interesting to say about the essence of Force and the 
definition of ‘number’. They might have found something interesting to say 
about the essence of Truth. But in fact they haven’t. ...[P]ragmatists see the 
Platonic tradition as having outlived its usefulness.” 


Our aim in this chapter is to say something interesting about the concept 
of epistemic justification. The subject has been in a predicament ever since 
Plato in his Theaetetus set out to answer the question: “What is the distinc- 
tion between knowledge and true belief?’ Nowadays the debate about epis- 
temic justification has become a multi-faceted affair, consisting of many sub- 
and sub-sub-discussions. More than once the participants in the debate have 
crossed the borders of epistemology in order to continue their arguments in 
the fields of ethics, metaphysics, or philosophy of mind, thus creating a com- 
plicated and colourful network of various positions with myriad connections 
and interdependencies. 

Some are pessimistic that progress can be achieved here. Roderick Chis- 
holm discouragingly comments on Plato’s undertaking in the Theaetetus: “It 
is doubtful that he succeeded and it is certain that we cannot do any better.”?. 
William Alston, as we have seen, is even more explicit. He firmly recom- 
mends the abandonment of the project of trying to understand justification 
altogether, and makes a plea for an epistemology that studies various ‘epis- 


2 Rorty 1982, xiv. 
3 Chisholm 1966, 5. 
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temic desiderata’. Like Rorty, he is clearly annoyed with a project that has 
demanded so much from so many and has delivered so little.* 

Our concern with epistemic justification in this book is in fact secondary. 
When we aim to say something interesting about the issue, this is not be- 
cause we aspire to define epistemic justification as such. Rather, we want to 
find out whether, and if so to what extent, it makes sense to speak of infinite 
justificatory chains. The major proponent of infinite chains in epistemology, 
Peter Klein, has argued that the latter objective can be achieved without the 
former. As he sees it, we need not unduly exert ourselves to understand justi- 
fication, for the meaning of epistemic justification is irrelevant to a discussion 
about the possibility of infinite justificatory chains. We can merrily discuss 
the pros and cons of infinitism without worrying about what exactly justifi- 
cation means, since infinitism is consistent with the many different accounts 
of the expression ‘A; justifies A;’. Klein lists five of those accounts, adding 
that the list is not exhaustive: 


1. if A; is probable, then A; is probable and if A; is not probable, then A; is 
not probable; or 

2. in the long run, A; would be accepted as a reason for A; by the appropriate 
epistemic community; or 

3. A; would be offered as a reason for A; by an epistemically virtuous indi- 
vidual; or 

4. believing that A; on the basis of A; is in accord with one’s most basic 
commitments; or 

5. if A; were true, A; would be true, and if A; were not true, A; would not be 
true.” 


* Alston’s exasperation is rooted in his belief that “[t]here isn’t any unique, epistem- 
ically crucial property of beliefs picked out by ‘justified’” (Alston 2005a, 22). Cf. 
“...it is a mistake to suppose that there is a unique something-or-other called ‘epis- 
temic justification’ concerning which the disputants are disputing” (Alston 1993, 
534). See also Alston 1989, where similar ideas are defended. Likewise, Richard 
Swinburne has denied that there exists one pre-theoretic concept ‘epistemic justi- 
fication’, which can subsequently be made clear in the way that Plato and Kant 
proposed (Swinburne 2001). But where Alston advises us to withdraw from justifi- 
cation research and to stop talking about justification altogether, Swinburne encour- 
ages us to let a thousand justificatory flowers bloom. The two positions may seem 
to be poles apart, but in a sense there is much overlap. In the end, Swinburne’s plu- 
ralistic view of epistemic justification and Alston’s plea for a plurality of epistemic 
desiderata might perhaps differ only terminologically. 

5 Klein 2007a, 12. Klein has p and q instead of A; and Aj. 
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He concludes that “infinitism can opt for whatever turns out to be the best 
account since each of them is compatible with what infinitism is commit- 
ted to.” As we have indicated in the previous chapter, we think this is too 
quick. Whether an infinite regress of justification makes sense may very well 
depend on the meaning of justification. We have seen that under a partic- 
ular interpretation of ‘x causes y’ or ‘x is smaller than y’ regresses may be 
harmless, whereas under an alternative interpretation they do not make sense. 
Something similar might well apply to the case of epistemic regresses, so it 
is incumbent upon us to consider what may be meant by ‘A, justifies A;’. 

In doing so, we are not trying to give a definition of justification. Nor are 
we making any claim about its relation to knowledge. Traditionally, justifi- 
cation has been seen as a necessary ingredient of knowledge, as that which 
has to be added to true belief. Recently, different views have been put for- 
ward, such as that justification is a derivative of the primitive concept of 
knowledge, or is possible knowledge, or potential knowledge, or appearance 
of knowledge, or that it implies truth, or that justification and knowledge 
simply coincide.’ We will make no such claims. Everything we say about 
justification is meant as a contribution to the debate about the possibility of 
epistemic regresses, not to the debate about how to define knowledge or jus- 
tification. Of course, the two issues are connected, but it would be a mistake 
to treat them as being on a par. Our aim, as said, is to find out to what extent 
infinite epistemic chains are possible. It will turn out that for this purpose it 
is enough to adopt a very modest and uncontroversial claim about justifica- 
tion; there is no need to define justification or to say how exactly it relates to 
knowledge. 


6 Ibid. 

7 Williamson 2000; Bird 2007; Jenkins Ichikawa 2014; Reynolds 2013; Littlejohn 
2012; Sutton 2007. We will shun the term ‘warrant’ when we speak of justification, 
since some reserve this term for ‘that which added to true belief yields knowledge’. 
See Plantinga 1993. 

8 That a theory of justification is different from a theory of knowledge has been 
argued in Booth 2011 and Foley 2012. Alvin Goldman also acknowledges that an 
interest in justification can have several different motivations, only one of which 
is an interest in knowledge as such (Goldman 1986, 4). Martin Smith, however, 
defends what he calls “the normative coincidence constraint’, according to which 
aiming at justification and aiming at knowledge coincide. We will say more about 
Smith’s views in 2.5. 
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The first thing to note when considering the concept of epistemic justification 
is that it is a relational notion, corresponding to a two-place predicate. When 
we say that something is justified, we mean that it is justified by something 
else. This ‘something else’ can be of the same ontological category as the 
thing justified, as for example when a belief justifies another belief. But it 
can also belong to another category, as when we say that a belief is justified 
by an experience or by an event. It might also happen that something justifies 
itself, and then the justifier and the thing justified coincide. My belief that I 
am having a belief, for example, falls into the latter category. 

Once the relational character of justification has been acknowledged, we 
can appreciate that the question ‘What is justification?’ in fact consists of two 
questions. If we want to know what is meant by the expression ‘A ; justifies 
A;’, we will have to answer both Q1 and O2: 


Q1: What is the character of the relata, A; and A;? 
O2: What is the character of the relation between A ; and A;? 


The difference is clear.” Q1 is about the ontology of reasons. It is a question 
about the stuff that the objects A; and A; are made of. Are they abstract enti- 
ties like propositions? Psychological entities like beliefs? Or are they events, 
or facts, or material objects? Question Q2 is about their connection. In the 
previous chapter we symbolized this connection by a single arrow, but what 
does this symbol mean? Is it the arrow of entailment, as has been argued 
by for example James Cornman and John Post?!° Does it represent ‘proba- 
bilification’, as William Alston and Matthias Steup have called it?!! Is the 


9 Others have also stressed this difference. Andrew Cling, for example, writes that “a 
theory of . . . reasons ...must do two things. First, it must give an account of the re- 
lationship that must obtain between . .. reasons and their specific targets. ... Second, 
[it] must specify the characteristics that a mental state must have if it is to be a reason 
for any target.” (Cling 2014, 62). Similarly, Ram Neta distinguishes between “rea- 
sons in the light of which a claim is justified” and “the relation ... between those 
reasons” (Neta 2014, 160). The very distinction is also central to Richard Fumer- 
ton’s ‘Principle of Inferential Reasoning’, that we will discuss in Section 8.5. 

'0 Cornman 1977; Post 1980. 

11 Alston 1993, 528; Steup 2005, Section 2.1. Regarding the question about the re- 
lation between A; and A;, Michael Williams takes a very different view. As he sees 
it, “there is no relation to account for” and he comments further: “There may well 
be relations of entailment ...or conditional probability .... But no such relation 
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justification relation primarily a logical relation, as is stated by Richard Feld- 
man and Earl Conee?!? Or should we follow David Armstrong and Alvin 
Goldman and hold that it is ultimately causal?!* 

Which answer one gives to Ol and Q2 depends largely on whether 
one takes an internalist or an externalistic view of epistemic justification. 
Our intuitive understanding of epistemic justification, which Kant would 
have called “confused and undetermined”, revolves around two aspects that 
philosophers of all times have struggled to amalgamate.'* On the one hand, 
justification has to do with the way the world is: it would be inappropri- 
ate to call our beliefs justified without requiring that they represent, at least 
remotely, how things actually are. On the other hand, justification applies 
to the way the world appears to us: it would be awkward to call my be- 
liefs unjustified if I have reasoned impeccably towards a conclusion which, 
through some freakish turn of fate, happens to be false. The fact that exter- 
nalists tend to stress the former, world-centred aspect of justification, while 
internalists emphasize the latter, agent-centred aspect, is reflected in their 
answers to Ol and O2. 

What do internalists and externalists say about the ontological status of the 
relata in ‘A; justifies A; ? Concerning A;, the thing justified, there seems to 
be not much disagreement. In the case at hand, both factions assert that A; is 
a proposition, or a belief in a proposition.!> But what about the ontological 
status of the justifier, A ;? Here the answer depends on which of the many 
different versions of internalism or externalism we are talking about. It also 
depends on whether A; is regarded as something that is itself inferred or as 


suffices to make a proposition a reason for another” (Williams 2014, 237). As will 
become clear later in this chapter, we agree that a relation of entailment or of con- 
ditional probability is not sufficient for saying that one proposition is a reason for 
another. However, if Williams is implying that such a relation is not necessary ei- 
ther, then we part company. In general, Williams’ approach to the epistemic regress 
problem is inspired by the later Wittgenstein and by ordinary language philosophers 
like Austin, and as such tends to eschew a more formal or theoretical approach, like 
the one that we pursue in this book. 

12 Feldman and Conee 1985; Conee and Feldman 2004. 

13 Armstrong 1973; Goldman 1967. 

14 Verworren und unbestimmt — see Kant’s treatise ‘Enquiry concerning the clarity 
of the principles of natural theology and ethics’ (Untersuchung über die Deutlichkeit 
der Grundsätze der natürlichen Theologie und der Moral) of 1764. Cf. Vahid 2011. 
We recall agreeable conversations with Hans Mooij and Simone Mooij-Valk about 
translating Kant and — vis-d-vis the motto of this book — Thomas Mann. 

'S But only in the case at hand, for A; can also be another cognitive state than a 
belief, or even a non-cognitive state. 
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something that is non-inferential. If A; is itself inferred, all internalists and 
some externalists will see A; too as a belief or a proposition. If A; is not 
inferred, some internalists will see it as a belief or a proposition (albeit of a 
special, basic kind), whereas other internalists maintain that in this case A; 
is a fact or an experience. Externalists will regard A; in this case as a fact, 
an object, or an event, but they differ in their opinions about what kind of 
fact, object, or event A; exactly is. Some say that it is a fact outside us; for 
example, if an airplane is flying by, and my perceptual and cognitive wiring 
is as it should be, then this fact is a reason for me to believe that an airplane 
is flying by. Other externalists hold that it is a fact inside us, for example the 
activation of my retina or my eardrum, causing neural events and brain states 
culminating in my belief that an airplane is flying by. 

As to Q2, internalists tend to be evidentialists: they see the relation be- 
tween the relata in ‘A; justifies A,’ as being logical or conceptual in charac- 
ter. For them, A; is a good epistemic reason for A; if A; is adequate evidence 
for A;. As Earl Conee and Richard Feldman phrase it: 


The evidentialism we defend ... holds that the epistemic justification of a be- 
lief is a function of evidence. !¢ 


According to evidentialism, a person is justified in believing a proposition 
when the person’s evidence better supports believing that proposition than it 
supports disbelieving it or suspending judgement about it. ... when a belief is 
based on justifying evidence, then ...the belief is well-founded. 1” 


Externalists, on the other hand, are mostly reliabilists: in their view, A; is 
a good epistemic reason for A; if and only if A; has been reliably formed 
on the basis of A j, where a belief-forming method is reliable if it results in 
acquiring true beliefs and avoiding erroneous ones. Reliabilists see the relia- 
bility relation as being nomological or even causal in nature.!® They criticize 
evidentialists for neglecting the difference between logic and epistemology, 
stressing that, while logic deals with inferences and the validity of argument 
forms, epistemology has to do with the practice of forming actual beliefs. As 
one of the pioneering reliabilists has it: 


... although epistemology is interested in inference, it is not (primarily) in- 
terested in inferences construed as argument forms. Rather, it is interested in 


16 Conee and Feldman 2004, 2. 

17 Ibid., 3. 

'8 One of the first papers that stresses the difference between logical and causal 
relations is Davidson 1963, although it does not contain the terms ‘evidentialism’ or 
‘reliabilism’ and is about rational action rather than about justified belief. 
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inferences as processes of belief formation or belief revision, as sequences of 
psychological states. So psychological processes are certainly a point of con- 
cern, even in the matter of inference. Furthermore, additional psychological 
processes are of equal epistemic significance: processes of perception, mem- 
ory, problem solving, and the like. 

Why is epistemology interested in these processes? One reason is its in- 
terest in epistemic justification. The notion of justification is directed, prin- 
cipally, at beliefs. But evaluations of beliefs ...derive from evaluations of 
belief-forming processes. Which processes are suitable cannot be certified by 
logic alone. Ultimately, justificational status depends (at least in part) on prop- 
erties of our basic equipment. Hence, epistemology needs to examine this 
equipment, to see whether it satisfies standards of justifiedness.!? 


These standards of justifiedness are given by the “right system of justifi- 
cational rules” or J-rules.”? No J-rule can be generated by logic alone, the 
main reason being that J-rules govern the transitions to states of belief, and 
that logic is not about such states: 


... logic formulates rules of inference, which appear in both axiomatic sys- 
tems and natural deduction systems. But these rules are not belief-formation 
rules. They are simply rules for writing down formulas. Furthermore, formal 
logic does not really endorse any inference rules it surveys. It just tells us 
semantic or proof-theoretic properties of such rules. This is undoubtedly rele- 
vant to belief-forming principles ... But it does not in itself tell us whether, or 
how, such rules may be used in belief formation.?! 


In the end, Goldman opts for what he calls “the absolute, resource-independ- 
ent criterion of justifiedness”: 


A J-rule system R is right if and only if R permits certain (basic) psychological 
processes, and the instantiation of these processes would result in a truth ratio 
of beliefs that meets some specified high threshold (greater than .50).? 


Evidentialism and reliabilism are usually described as opposing positions, 
but recently arguments have been put forward for a rapprochement between 
the two, including some arguments by Goldman himself.?? We similarly ad- 


19 Goldman 1986, 4. 

20 Thid., 59. 

2! Ibid., 82. 

22 Tbid., 106. Goldman adds that the rightness of the rule system R should be in the 
set of “normal worlds”, i.e. “worlds consistent with our general beliefs about the 
actual world”. (Ibid., 107.) 

23 Goldman 2011; Comesana 2010. Alston 2005a, Chapter 6, defends the thesis that 
evidentialism and reliabilism are virtually identical. 
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vocate a reconciliation, but our argument is different from the existing ones 
in that it relies on the formal side of the justification relation. 

Neither evidentialists nor reliabilists have been very explicit about this 
formal side. Conee and Feldman, when referring to the relation, speak about 
‘fittingness’: a belief in a proposition is justified for a particular person if and 
only if that belief fits the person’s evidence.”* They explicitly refrain from 
describing the fitting relation in formal detail, presumably because they want 
to keep their analysis as general a possible. Reliabilists, of course, are not 
particularly interested in the formal side of the justification relation either, 
since they consider it inessential to the actual process of acquiring a justified 
belief. In contrast to both groups, we deem it fruitful to investigate the formal 
structure of the justification relation. As we will see, this will enable us to 
reconstruct the evidentialist and the reliabilist view as two interpretations of 
one and the same formal framework. 


2.3 Entailment 


When it comes to the formal side of the justification relation, we can perceive 
in the literature three major proposals. According to the first, ‘A ; justifies A; 
should be read as 


‘A; implies A; or ‘A, entails A;’. 
According to the second, we should interpret it as 
‘A; probabilifies A;’ or ‘A; makes A; probable’. 


The third proposal is based on work by Fred Dretske and Robert Nozick, 
and it is sometimes referred to as truth-tracking.”> Roughly, it states that a 
person is justified in believing proposition A; if this person tracks the truth 
of A;, which in this case means: bases his belief in A; on A;. On a formal 
level, the truth-tracking approach makes use of subjunctive conditionals of 
the form 


(a) ‘if A; were the case, then A; would be the case’ 
(b) ‘if A; were not the case, then A; would not be the case’. 


24 Conee and Feldman 2004, Chapter 4. 
25 Dretske 1970, 1971; Nozick 1981. 
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Nozick argues that the subjunctive conditional (a) leads to the probability 
P(A;|A;) = 1, and that conditional (b) leads to P(—=A;|—A;) = 1, or equiva- 
lently P(A;|—A ;) = 0. This corresponds to what he calls ‘strong truth track- 
ing’. As he rightly comments, “the evidence we have for hypotheses is not 
usually strong evidence; too often although the evidence would hold if [the 
hypothesis] were true, it might also hold if [the hypothesis] were false.”*° 
He is then led to a probabilistic approach that is the same as the second 
proposal. However, rather than framing the subjunctive conditionals in prob- 
ability statements, it may be more natural to couch them in the language of 
possible world semantics, invoking David Lewis’s method of nearby possible 
worlds.” 

We will say something about the first proposal in the present section. The 
second one will be discussed in Section 2.4, where we argue that probabilis- 
tic support is a necessary but not a sufficient condition for justification. We 
will say more about the third proposal in Section 2.5, where we consider an 
argument by Martin Smith that can be seen as an objection to our argument 
in 2.4. 

The idea that justification has something to do with implication or en- 
tailment appears to be widely accepted. Aristotle assumes it in his writings 
on epistemic regresses, and many epistemologists in the twentieth century 
who write about justification seem to have had implication in mind. ‘Seem’, 
for the idea often remains implicit. This goes for the literature on epistemic 
justification in general, but also for the more specific papers on the regress 
problem in epistemology. For example, Tim Oakley develops an argument 
according to which no beliefs can be justified since that would require an 
infinite regress. Before presenting his argument, he writes: 


I offer no analysis of the term ‘justified’, since this is not required for my 
argument, and take the notion to be a commonsense one, regularly though 
unreflectively used by us all.?® 


Oakley’s paper makes it however very clear that at least part of his argument 
only works when justification is taken as implication or as deductive infer- 
ence. Thus Scott Aikin rightly notes that in Oakley’s argument “deductive 
inference rules play the role of inferential justification”.”” And Oakley is no 
exception here. Among authors who defend the sceptical position that no be- 
lief can be justified because that would demand an infinite epistemic chain, 


26 Nozick 1981, 250. 

27 Williamson 2000; Pritchard 2005, 2007, 2008; Sosa 1999a, 1999b. 
28 Oakley 1976, 221. 

29 Aikin 2011, 59; Oakley 1976, Sections 4.3 and 5.3. 
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many tacitly make the assumption that ‘A, justifies A;’ means ‘A; implies 
Ar. 

Occasionally, however, authors are explicit about their use of the justifi- 
cation relation as implication or entailment. John Post is a case in point.3? 
Post first takes justification to be inferential justification and then notes: 


If anything counts as an inferential justification relation, logical implication 
does ... 
provided it satisfies appropriate relevance and noncircularity arguments.°! 


More particularly, Post sees justification as “proper entailment”: 


Let us say a statement A; properly entails a statement A; iff A; semantically 
entails A;, where the entailment is relevant and non-circular on any appro- 
priate account. Thus if anything counts as an inferential justification relation, 
proper entailment does, in the sense that where A ; and A; are statements rather 
than sets of statements: ‘If A; properly entails A;, then A; is justified for [a 
person] P if A; is — provided P knows that the proper entailment holds and 
would believe A; in the light of it if he believed A ,.”? 


There exist many cases of proper entailment à la Post. The example that he 
himself presents is based on modus ponens. If A; is 


P\(p>4q), 


and A; is g, then A; properly entails Aj. 

To regard justification as implication or entailment has the advantage (if 
it is one) that justification is transitive and truth-conducive. However, it has 
been rightly criticized as a view that puts very strong requirements on the 
notion of justification, and may typically lead to scepticism if rigorously 
implemented. In 1978 Richard Foley had already made a plea for allowing 
“non-paradigmatically justified beliefs”, i.e. beliefs of which the justifica- 
tion is not subject to such strong requirements as those that follow from 
straightforward implication.*? Foley leaves open what exactly he means by 


30 Post 1980. In this paper Post describes a particular objection to infinite regresses 
that we will discuss in Section 6.3. Post’s argument can be seen as an improved 
version of objections that have been raised by John Pollock and James Cornman 
(Pollock 1974; Cornman 1977). 

31 Post 1980, 33. 

32 Ibid. We have replaced Post’s X and Y by our A j and Aj. Post talks about “state- 
ments” where we use ‘propositions’ or ‘beliefs’. In this chapter we will not distin- 
guish between the latter two terms. 

33 Foley 1978, 316. 


36 2 Epistemic Justification 


non-paradigmatic justification, refraining from giving a general account of 
the phenomenon, and even doubting whether such an account can be given 
at all. 

The second proposal for a formal rendering of ‘A ; justifies A;’, to be dis- 
cussed below, is more important and more realistic; in fact, it goes some way 
towards specifying the non-paradigmatic account of justification that Foley 
has been looking for. 


2.4 Probabilistic Support 


A distinction is often made between deontological and non-deontological 
justification. In the deontological understanding, as Matthias Steup phrases 
it, a person S “is justified in believing that r if and only if S believes that r 
while it is not the case that S is obliged to refrain from believing that r.”** 
Steup notes that the deontological concept “is common to the way philoso- 
phers such as Descartes, Locke, Moore and Chisholm have thought about 
justification”, but that today it is deemed “unsuitable for the purposes of 
epistemology”. What is deemed suitable today is the non-deontological view, 
which conceives justification as “probabilification”: 


What does it mean for a belief to be justified in the non-deontological sense? 
Recall that the role assigned to justification is that of ensuring that a true belief 
isn’t true merely by accident. Let us say that this is accomplished when a 
true belief instantiates the property of proper probabilification. We may, then, 
define non-deontological justification as follows: 


[Person] S is justified in believing r if and only if S believes that r on a 
basis that properly probabilifies S’s belief that r. 35 


Instead of ‘probabilification’, epistemologists also use the term ‘to make 
probable’ for justification. Says for example Richard Fumerton: 


Can we find a way of characterizing epistemic justification that is relatively 
neutral with respect to opposing analyses of the concept? As a first stab we 


34 Steup 2005. Steup has p for our r. 

35 Ibid. The importance of probabilification has also been stressed earlier by William 
Alston, albeit not as a way of understanding justification, but as one of the epistemic 
desiderata that deserve thorough study: “The reason or its content must be so related 
to the target belief and its content that, given the truth of the former, the latter is 
thereby likely to be true. The reason must sufficiently ‘probabilify’ the target belief.” 
(Alston 1993, 528). 


2.4 Probabilistic Support 37 


might suggest that whatever else epistemic justification for believing some 
proposition is, it must make probable the truth of the proposition believed. 
The patient with prudential reasons for believing in a recovery was more 
likely to get that recovery as a result of her beliefs, but the prudential rea- 
sons possessed did not increase the probability of the proposition believed — 
it was the belief for which the person had prudential reasons that resulted in 
the increased probability. Epistemic reasons make likely the truth of what is 
supported by those reasons ... .*° 


Here we shall work under the assumption that ‘probabilification’ or ‘making 
probable’ is essential for the concept of justification. To say that A; makes 
A; probable at least means that A ; raises the probability of A; if A; is true, as 
compared with the value it would have had if A; had been false. So 


P(AiA;) > P(Ai|>Aj), (2.1) 


in words: A; is more probable if A; is the case than if A; is not the case.” We 
say that A; makes A; more probable if and only if (2.1) is fulfilled. Here we 
assume that P(A ;) lies strictly between zero and one, but in later chapters we 
will see how to drop this assumption. 

We will call (2.1) the condition of probabilistic support. It is in fact equiv- 
alent to the classificatory version of what Rudolf Carnap in the preface to the 
second edition of his Logical Foundations of Probability calls “increase in 
firmness”.® While Carnap’s concept of “firmness” is concerned with how 
probable A; is on the basis of Aj, his notion of “increase in firmness” re- 
lates to the question as to whether and by how much the probability of A; 
is increased by the evidence A;. Carnap specifies, both for firmness and 
for increase in firmness, three versions: a classificatory, a comparative and a 
quantitative variant. In the classificatory variant of increase in firmness, A; is 
made firmer by A ;. Or in Carnap’s formulation (where we have replaced his 
c by our P): 

P(Aj|A;) > P(Ailt). (2.2) 


Here ¢ is the tautology, so (2.2) is the same as 


P(AiJAj) > P(Ai), (2.3) 


36 Fumerton 2002, 205. 

37 That Fumerton has in mind ‘making more probable’ or ‘increasing the probabil- 
ity’ when he writes about ‘making probable’ is indicated by his use of the expression 
“the increased probability”. 

38 Carnap 1962, xv-xvi. 
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which is equivalent to our condition of probabilistic support, (2.1), for while 
P(A;|=A;) and P(A;) will in general not be equal to one another, the two 
inequalities (2.1) and (2.3) imply one another.*? Three observations should 
be made about this condition (2.1). First, the condition is quite weak. It only 
says that A; is made more probable by A; than by —A;. It does not say that 
A; is made more probable by A; than by another proposition Ax, nor does it 
claim anything about propositions different from A; that are made even more 
probable by Aj. Our condition is silent about the amount of probabilistic 
support that A; receives from A ; as compared to the amount of probabilistic 
support that A; would have received from another proposition A. So the 
condition does not imply that the former amount is greater than the latter, 
nor does it imply that the amount of probabilistic support given to A; should 
exceed a particular threshold.” 

Second, the condition of probabilistic support is not a measure. As Bran- 
den Fitelson has emphasized, there are many different measures of proba- 
bilistic support or confirmation, and they are often ordinally inequivalent to 
one another.*! This might be problematic in many contexts, but it is not an is- 
sue for us. For the various measures of probabilistic support all agree in stat- 
ing that A; probabilistically supports A; if and only if P(Aj|A;) > P(Ai|7A;), 
and this is all we need here. 

Third, the condition does not need a threshold. It could reasonably be 
objected that the phrase ‘making probable’ involves more than ‘making more 
probable’. If A; makes A; probable, then surely the effect of A; on A; must be 
not merely to raise the probability of A;, but also to raise it above one half (or 
perhaps above some agreed-upon threshold greater than a half). In Section 
6.5 we shall say some more about thresholds, but the chief thing to realize 
is that a threshold condition is not needed for our purpose: a threshold is 
not required for finding out to what extent infinite justificatory chains make 


3 From the definition of conditional probability it follows that 
P(Aj|Aj) — P(Ai) = P(A;)[P(A:]A;) — P(Ai|“Aj)]- 


The right-hand side of this equation is greater than zero if (2.1) is true, so the left- 
hand side must be greater than zero too, and this implies (2.3). By similar reasoning, 
it is clear that, if (2.3) is true, then (2.1) is true. Recall that here P(A j) is neither 0 
nor 1. 

40 We will assume, however, that i 4 j. We are after all interested in probabilis- 
tic support as a condition of epistemic justification, and in accordance with Peter 
Klein’s Principle of Avoiding Circularity (PAC), A; may not be in its own evidential 
ancestry. 

4l Fitelson 1999. 
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sense. The only thing we need for this is that (2.1) is to be regarded as a 
minimal condition implicit in the notion of ‘making probable’. 

We consider condition (2.1) to be an essential ingredient of the relation 
of epistemic justification, insofar as it involves probabilification or ‘making 
probable’. It is important to keep in mind that the condition itself is com- 
pletely formal. It does not imply anything about the ontological character of 
the relata nor does it make any assumption about how the probability relation 
should be interpreted. If A; is a belief or proposition that is justified by Aj, 
then the justifier A; can be anything: a belief, a proposition, a fact, an event, 
a perception, a memory or a neural state — this is completely irrelevant. 
And P can also be anything: subjective or objective or logical probability, 
that does not matter. What does matter is that (2.1) is governed by the formal 
probability calculus, i.e. the axioms of Kolmogorov and the theorems that 
follow from them. *? 

Precisely because the calculus is formal and thus uninterpreted, condition 
(2.1) is neutral with respect to internalism and externalism, and also with re- 
spect to evidentialism and reliabilism. The condition can be combined with 
internalistic and externalistic views concerning the ontology of reasons, as 
well as with evidentialist and reliabilist understandings of the justification 
relation. After all, internalists, externalists, evidentialists and reliabilists do 
not differ about the probablity calculus, nor is there anything in their posi- 
tions that goes against formalizing ‘A; is made probable by A,’ in terms of 
(2.1). The only differences between them are about interpretations: whereas 
internalists construe A; and A ; internalistically, externalists construe them ex- 
ternalistically, and whereas evidentialists interpret the probability relation in 
logical terms, reliabilists interpret it in nomological terms, typically in terms 
of probabilistic causality. But these are just differences in interpretations, and 
they do not touch the underlying formal level. 


42 If P(A;|A;) < P(A;|-A ,), then it is ~A; rather than A; that supports A; probabilis- 
tically. The point that the negation of one event could be the cause of another was 
already made by Hans Reichenbach when he introduced his concept of the com- 
mon cause. Reichenbach notices that in this case we must revise our opinion on 
the working cause, or as he puts it succinctly “in this case A; and —A; have merely 
changed places” (Reichenbach 1956, 160 — we have replaced his C by our Aj). 
Thus an arbitrary chain of probabilistic relations can be recast in a form in which 
probabilistic support (or neutrality) holds all along the chain. This does not mean 
that any proposition can be justified by any probabilistic chain, for the condition of 
probabilistic support is only a necessary, not a sufficient condition of justification. 
In Section 6.5 we will look at some further desiderata for an adequate description 
of what justification entails. 
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It might be true that more people are inclined to interpret (2.1) along the 
lines of internalism and evidentialism.** Under that interpretation, A; and A j 
are both beliefs or propositions, and P is construed as subjective or epistemic 
or logical probability. The point we want to make here is that (2.1) can just as 
easily be understood in accordance with externalism and relabilism. Under 
this interpretation, A; and A; can be beliefs, perceptual appearances, memo- 
ries, and so on. For example, if A; is my belief in the proposition that a cow is 
grazing in front of me, and A; is my seeing a cow grazing in front of me, then 
(2.1) states that it is more likely that my belief in a grazing cow is true, given 
that I have this perception, than when I do not have this perception. Here P 
is an objective probability, depending on the frequency of events, where the 
events are “seeing a grazing cow’ and ‘believing that there is a grazing cow’. 
Whether or not (2.1) holds here is determined by empirical research or, more 
generally, by past performance. Is it the case that my seeing a cow grazing is 
more often followed by a belief in a cow grazing than that my perception of 
a horse jumping is followed by a belief that a cow is grazing? The answer is 
presumably in the affirmative, so (2.1) is satisfied. 

The thing justified need not be a belief of which the probability is deter- 
mined on the basis of perceptual appearances. It can also be the other way 
around. In cases of wishful thinking or of harbouring strong suspicions, my 
beliefs or my desires can cause in me certain perceptual appearances. Here 
the causal course runs in the opposite direction. Again, we determine empiri- 
cally whether or not a causal process is in fact taking place, and thus whether 
or not (2.1) is satisfied: some people are more prone to wishful thinking or 
to being suspicious than others. 

It might happen that a causal process gives rise to a false belief. Optical 
illusions are a classic example. When I am walking in the desert, the refrac- 
tion of light from the sky by heated air can cause me to believe that there is 
a sheet of water in front of me. This causal process is probabilistic (at some 
times I am more vulnerable to this optical illusion than at others), but it is 
assumed that (2.1) is fulfilled (I mostly fall prey to the illusion when walking 
in the desert). Goldman will presumably say that this is not a reliable belief- 
forming process, and that my belief that there is a sheet of water in front of 
me is not justified. As we will explain in the next section, however, we are 
not proposing to define justification as the condition of probabilistic support 


43 Thus René van Woudenberg and Ronald Meester have argued that “the tradi- 
tional epistemic regress problem” is by definition cast in internalistic and doxastic 
terms, and they seem to hold that it should also be considered in those terms (Van 
Woudenberg and Meester 2014). For more on internalism and the regress problem, 
see Simson 1986 and Jacobson 1992. 
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(2.1), nor are we saying that, if the condition is satisfied, then there is justifi- 
cation. Our claim is only the moderate one that the condition of probabilistic 
support is a necessary ingredient of epistemic justification. This claim in no 
way conflicts with a reliabilist stance à la Goldman.“ 

In sum, our condition of probabilistic support is neutral with respect to 
many debates about justification. One may understand justification internal- 
istically or externalistically, in accordance with evidentialism or with relia- 
bilism — all these views can be combined with the condition of probabilistic 
support as a mechanism underlying justification. And there are more views 
that we can accommodate. For example, the condition of probabilistic sup- 
port is consistent with either side in the debate about the difference between 
diachronic and synchronic justification, to which especially Swinburne has 
drawn attention; the only thing we have to do to account for this difference 
is to add a time index to, or remove it from (2.1). Furthermore there is no 
reason to restrict the causal processes modelled by (2.1) to individual people; 
we might well regard them as taking place in a community. Alvin Goldman 
writes: 


The task of social epistemology ...is to evaluate truth-conducive or truth- 
inhibiting properties of such relationships, patterns, and structures. What 
kinds of channels, and controls over channels, comprise the best means to 
‘verific’ ends? To what degree should control be consensual, and to what de- 
gree a function of (ascribed) expertise, or ‘authority’? To what extent should 
diversity of messages be cultivated?*° 


44 While the application of (2.1) to internalism is straightforward, since the probabil- 
ity space is homogeneous (containing only propositions or beliefs), the application 
to externalism is a bit more complicated. Within externalism many things can be 
reasons, so the probability space is rather diverse, containing not only beliefs and 
propositions, but also perceptions, memories, facts, and so on. This difficulty can 
however be handled as follows. First we define different spaces: a space of beliefs, a 
space of perceptions, a space of memories, and so on. Then we define a space which 
is the Cartesian product of all those spaces. And finally we decide which relation 
of probabilistic causality we want to focus on. Do we want to focus on perceptions 
causing beliefs? Or memories causing beliefs? Or beliefs causing desires? Desires 
causing beliefs? Deciding on the answers to these questions is necessary in order 
to keep a grip on the heterogenous probability space, but it is just a slight technical 
complication, and it is not important for the general philosophical point that (2.1) is 
neutral with respect to both internalism and externalism. 

45 Swinburne 2001, Chapters 2, 7, and 8. 

46 Goldman 1986, 5-6. 
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Because the probability calculus is neutral with respect to the nature of the 
relationships, patterns, and structures that Goldman is talking about, it can 
accommodate social epistemology alongside individual epistemology.*’ 

Some philosophers have doubted that our condition of probabilistic sup- 
port is in fact neutral. Richard Fumerton first presents the following as a 
neutral, preliminary characterization of epistemic justification 


epistemic justification ... must make probable the truth of the proposition be- 
lieved, 


and then comments: 


Our preliminary characterization of justification as that which makes probable 
the truth of a proposition may not in the end be all that neutral.*® 


Fumerton gives two reasons why the ‘making probable’ relation may after 
all not be neutral. The first is that a “normative feature of epistemic justifi- 
cation ... may call into question the conceptual primacy of probability as a 
key to distinguishing epistemic reasons from other sorts of reasons”.*? The 
second is that “if one understands the relation of making probable in terms 
of a frequency conception of probability, one will inevitably beg the ques- 
tion with respect to certain internalist/externalist debates over the nature of 
justification”.° Let us briefly look at each of these two reasons. 

The idea behind the first is that epistemic reasons differ from moral or 
prudential or legal ones, since an epistemic goal is not the same as a goal 


47 The condition of probabilistic support is also neutral with respect to several 
desiderata for the justification relation. For example, Oliver Black required that the 
relation be irreflexive and transitive (Black 1988; for a “more frugal” formulation of 
the desiderata, see Black 1996); Romane Clark required transitivity and asymmetry 
(Clark 1988, 373); and Andrew Cling has argued that having both transitivity and 
irreflexivity is too strong as a desideratum, proposing an “improved version” of the 
epistemic regress problem (Cling 2008; for critical replies to Cling, see Kajamies 
2009 and Roche 2012; for a further discussion about the transitivity of justification 
see Post and Turner 2000 versus McGrew and McGrew 2000). 

Our claim that probabilistic support is necessary for justification does not pre- 
clude justification’s being irreflexive or transitive or asymmetrical, even though 
probabilistic support itself is reflexive, not transitive, and symmetrical. In other 
words, the claim is not in conflict with the above desiderata for the justification 
relation, but it does not necessitate them. We will come back to this point at the end 
of Chapter 6. 

48 Fumerton 2002, 205. 
4 Tbid., 205-206. 
50 Thid., 206. 
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that is moral, prudential, or legal. Whenever one believes a proposition for 
epistemic reasons, one believes that proposition because it is probably true, 
not because it is useful or moral to believe it. But suggestive as this account 
of an epistemic reason may be, says Fumerton, “we are in danger of collaps- 
ing the distinction between true belief and justified belief”.°! For if a belief 
is justified if and only if it is probably true, then “our ‘goal’ oriented account 
of epistemic justification becomes pathetically circular”.”? 

Fumerton’s worry can however be dispelled. In order to credit the impor- 
tant role of probabilistic support for justification, we need not say that a be- 
lief is probably true if and only if it is justified. It is enough to say that being 
probably true is a necessary ingredient of being justified. It seems to us that 
Fumerton is confusing the conceptual primacy of probability for justification 
with its sufficiency. He would have been right if ‘probabilistic support’ had 
been taken as sufficient for epistemic justification, but as we have said, there 
is no need to do so. Probabilistic support is merely a necessary, and by no 
means a sufficient condition for justification.** 

What about Fumerton’s second reason? He is certainly right that the ‘mak- 
ing probable’ relation will support externalistic and undermine internalistic 
positions if understood in frequency terms. But our point is that we need not 
understand the relation in frequency terms, nor need we understand it in non- 
frequency terms. If we regard the relation at its formal level, as the condition 
(2.1), then we are not committed to any interpretation. While Fumerton re- 
gards the ‘making probable’ relation as something that comes with an inter- 
pretation, we have construed it as a mere formal structure with uninterpreted 
symbols. 

Apparently Fumerton sees the poly-interpretability of the calculus as a 
drawback for the idea of modelling justification by means of probability the- 
ory. This is the view not only of Fumerton, the internalist, but also of Gold- 
man, the externalist: 


Another admissible theory would let justifiedness arise from the corpus of a 
cognizer’s beliefs plus probabilistic relationships between the target beliefs 


5! Ibid., 209. 

52 Ibid. 

53 René van Woudenberg and Ronald Meester appear to think that we deem prob- 
abilistic support to be sufficient for justification (Van Woudenberg and Meester 
2014). They criticize our condition (2.1) on the grounds that it allows P(A;|A;) 
and P(A;|—A;) both to be very small, so that P(A;) is also very small; in that case 
(2.1) is fulfilled, but it would be ridiculous to say that P(A;) is justified. The criti- 
cism of Van Woudenberg and Meester fails precisely because (2.1) is not a sufficient 
condition for justification. 
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and the beliefs in this corpus (or rather, the propositional contents of these 
beliefs). But here a theorist must tread carefully. The term ‘probability’ is no- 
toriously ambiguous, and some of its proposed explications implicitly render 
it a term of epistemic evaluation (tied to what an epistemically rational person 
would do). For present purposes, ‘probability’ would have to be restricted to 
some other meaning, for example, a frequency, or propensity sense.>+ 


Goldman is right that the term ‘probability’ is notoriously ambiguous, but 
only if we take the term to mean ‘calculus plus interpretation’, not if it refers 
to the calculus alone. We are inclined to consider the poly-interpretability 
of the calculus as an advantage rather than a drawback, since it enables us 
to work with a well-defined formal framework from which we can derive 
consequences that hold irrespective of the interpretation. A comparison with 
Donald Davidson’s work might make the point clearer. Davidson argued that 
an action is only explained (or ‘rationalized’ as he calls it) if it is both logi- 
cally and causally connected to the relevant beliefs and desires.’ This raises 
however the question how the two connections can be combined. How to rec- 
oncile the position of the so-called causalists with that of the adherents of the 
Logical Connection Argument, as Frederick Stoutland has aptly called their 
adversaries?°° Davidson’s own ingenious answer, motivated by his anoma- 
lous monism, was that causality is essentially dual. It involves singular causal 
statements of the form ‘token event E causes token event F’, which can be 
true independently of how E and F are described, as well as causal expla- 
nations, which centre around causal laws (‘events of type & cause events of 
type F’), and thus are valid only under certain descriptions. 

Resourceful as Davidson’s answer may be, a reconciliation between log- 
ical and causal connections appears easier once we have taken recourse to 
probability theory. For probability can model both the logical and the causal 
relation between reasons and the beliefs or actions that they explain or jus- 
tify. All we have to do is to replace the logical relations by probabilistic 
relations, and to substitute probabilistic causality for causality tout court. No 
assumption about a dual character of causality is needed. 

As we will show in the chapters to come, the probability calculus has con- 
sequences which are very relevant to the possibility and impossibility of infi- 
nite epistemic chains. This is not to say that the probability calculus does not 
face problems, or that its interpretations are unproblematic. It is well known 
that it has many difficulties that are far from being solved: the problems of 


54 Goldman 1986, 24. 
55 Davidson 1963, 1970. Cf. our footnote 18. 
56 Stoutland 1970; Cf. Peijnenburg 1998. 
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old evidence, spurious relations, irrelevant conjunctions, the prior, the ref- 
erence class, randomness, and so on. Moreover, apart from those technical 
quandaries, there is the mundane fact that in actual reasoning the calculus 
seems to be often violated.”’ These problems are grave indeed, but we do 
not think they are reasons to reject the calculus as a means of shedding light 
on the elusive concept of justification. We will say a bit more on this in Sec- 
tion 6.6. 


2.5 Smith’s Normic Support 


We have been arguing that the formal condition of probabilistic support is 
a necessary ingredient of epistemic justification and, more generally, that 
Kolmogorovian probability can help us understand what justification is. In 
this section and in the next one, we will discuss what can be seen as two 
objections to these views. 

The first objection is based on work by Martin Smith. It takes its inspira- 
tion from what we have called the third proposal for framing the formal side 
of the justification relation. According to that proposal, knowledge and jus- 
tification are best understood on the basis of subjunctive conditionals, which 
in turn are framed as statements about nearby possible worlds. 

Smith has identified a conception of justification which he claims is “taken 
for granted by a broad range of epistemologists”, and which he calls ‘the risk 
minimisation conception of justification’ : 


... for any proposition A we can always ask how likely it is that A is true, given 
present evidence. The more likely it is that A is true, the more justification one 
has for believing it. The less likely it is that A is true, the less justification 
one has for believing that it is. One has justification simpliciter for believing 


57 ‘Seems’, because sometimes the violation is only apparent. For example, there 
have been many attempts to explain the famous conjunction fallacy, where people 
deem the probability of a conjunction (‘Linda is a bank teller and a feminist’) to be 
higher than the probability of a conjunct (‘Linda is a bank teller’). According to one 
of these explanations, if we reconstruct the reasoning in terms of confirmation mea- 
sures rather than of bare probability values, then it is no longer fallacious (Crupi, 
Fitelson, and Tentori 2007). It is true that we often do not know which mistake ex- 
actly people are making, or whether they are making a mistake at all (we might 
after all have insufficient information about their cognitive make-up). But from this 
it does not follow that the probability calculus is not a useful instrument to investi- 
gate whether mistakes are being made, and, if so, what the nature of these mistakes 
is. 
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A (at least at a first approximation) when the likelihood of A is sufficiently 
high and the risk of —A is correspondingly low. Call this the risk minimisation 
conception of justification.** 


Describing a proposition as ‘likely’ means here that the proposition has “an 
evidential probability that exceeds some threshold 7 that lies close to 1 and 
may be variable or vague”.°’ Smith notes that there is “something very natu- 
ral” about the entire risk minimization view, but nevertheless concludes that 
it is “not true at all”.60 According to him it reduces epistemic justification to 
evidential support, and wrongly so. For it might happen that the evidential 
support is high, well beyond some threshold of acceptance, while intuitively 
we would not say that there is justification. Conversely, there might be jus- 
tification even though the evidential support is relatively low.°! Smith has 
illustrated these claims with several appealing examples. Here is one which 
he borrowed from Dana Nelkin: 


Suppose that I have set up my computer such that, whenever I turn it on, 
the colour of the background is determined by a random generator. For one 
value out of one million possible values the background will be red. For the 
remaining 999 999 values, the background will be blue. One day I turn on my 
computer and then go into the next room to attend to something else. 

In the meantime Bruce, who knows nothing about how my computer’s 
background colour is determined, wanders into the computer room and sees 


58 Smith 2010, 11. Cf. Smith 2016, 2. We have substituted A for P. 

5 Smith 2016, 29. Note that the concept of evidential probability as Smith uses it 
here is not the same as the concept of probabilistic support that we talked about in 
the previous section. Smith says that A is likely if and only if its evidential prob- 
ability, or evidential support, exceeds some threshold: P(A|E) >t, where E is the 
evidence. But we say that P(A|E) satisfies the condition of probabilistic support if 
and only if P(A|E) > P(A|-E). 

6 Thid., 30. 

6l The difference between epistemic justification and evidential support has been 
stressed by many others as well. For example, Jarrett Leplin argued that a belief 
may be highly probable while not justified, and it may be justified even though its 
probability is very low (Leplin 2009, 101-109). We think that the latter claim is 
questionable, but even if we grant both claims, Leplin’s argument would not af- 
fect our view. For Leplin is not talking about probabilistic support in our sense, but 
about a probability above a certain threshold. The latter also applies to the analysis 
by Tomoji Shogenji (Shogenji 2012), which we will discuss in Section 6.5. Other 
defenders of the difference between justification and evidential support are Peter 
Klein (1999, 312; 2003, 722), Scott Aikin (2011, Chapter 3), and in general propo- 
nents of Williamson’s “knowledge first’ approach as well as champions of both the 
safety and sensitivity condition for knowledge. 
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that the computer is displaying a blue background. He comes to believe that 
it is. Let’s suppose, for the time being, that my relevant evidence consists of 
the proposition that (E1) it is 99.9999% likely that the computer is displaying 
a blue background, while Bruce’s relevant evidence consists of the proposi- 
tion that (E2) the computer visually appears to him to be displaying a blue 
background.” 


Let A be the proposition that the computer is displaying a blue background. 
It is clear that my evidence E, does not imply A, since E4 is compatible with 
a red background. But neither does E> imply A: “After all, Bruce could be 
hallucinating, or he could be struck by colour blindness, or there could be 
some coloured light shining on the screen, etc”? The point Smith makes is 
that Bruce’s belief in A is a candidate for knowledge, whereas my belief in A 
is not: 


Bruce’s belief would appear to be a very promising candidate for knowledge 
— indeed, it will be knowledge, provided we will fill in the remaining details 
of the example in the most natural way. My belief, on the other hand, would 
not constitute knowledge even if it happened to be true. If there were a power 
failure before I had the chance to look at the computer screen, I might well 
think to myself ‘I guess Pll never know what colour the background really 
was’. But Bruce certainly wouldn’t think this. 


This means, says Smith, that Bruce is epistemically justified in believing A 
while I am not. And this is so, even if we assume that A is more likely given 
my evidence £; than given Bruce’s evidence EF}: 


P(A|E1) > P(A|Ey). 


The reason why Bruce is justified and I am not, is that the relation between 
A and Er is one of normic support, whereas the relation between A and E; 
is only characterized by mere evidential support. Mere evidential support 
relations only imply that events are likely or unlikely, but normic support 
relations tell us when events are normal or abnormal: 


Given my evidence Eı, A would frequently be true. Given Bruce’s evidence 
E>, A would normally be true. .... 65 


If I were to find out that the background of my computer screen is actually 
red, I would conclude that this had been merely very unlikely, not that it 


62 Smith 2010, 13. Cf, Nelkin 2000, 388-389. 
63 Smith 2010, 14. 

64 Tbid. 

65 Tbid., 16. 
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was abnormal. But if Bruce discovers that the background actually is red, he 
would conclude that something is not right: he would require an explanation 
in a way that I would not. In general, E normically supports A if the truth of 
both E and —A is not only unexpected, but a genuine anomaly: 


Say that a body of evidence E normically supports a proposition A just in case 
the circumstance in which E is true and A is false requires more explanation 
than the circumstance in which E and A are both true.°° 


At first sight, the distinction between normic support and evidential sup- 
port looks much like the distinction between law-like statements and mere 
statistical generalizations in philosophy of science. Smith indeed makes the 
comparison: 


The distinction between the E1-A relationship and the £2-A relationship might 
fruitfully be compared to the distinction between statistical generalisations 
and normic or ceteris paribus generalisations widely accepted in the philoso- 
phy of science ....67 


On closer inspection, however, there seems to be a difference. The problem 
in philosophy of science is that we lack a criterion for determining whether 
a particular sequence is law-like or merely accidental. All sequences that 
we encounter are finite, and we never know for sure whether or how they 
will continue — such are the lessons of Hume and Goodman. There might 
come a time when the sun does not rise, or rises only on Sundays, or rises 
in a completely random and unpredictable way. If, per impossibile, we knew 
for sure that a particular statement is a law-like statement, then we would 
be done; we could then safely use this knowledge in our predictions (which 
would hardly be predictions any more). And if, on the other hand, we knew 
that we are dealing with a mere statistical generalization, then we would 
realize that we should proceed cautiously, since we would find ourselves on 
rocky and unreliable ground. 

The problem of law-like versus accidental generalizations is that we have 
no way to determine whether we are in the one or in the other situation, since 


66 Smith 2016, 40. The notion of normic support is further clarified in terms of 
normal worlds: “E normically supports a proposition A just in case A is true in 
all the most normal worlds in which E is true. Alternatively, we might say that 
E normically supports A just in case there is a world in which E is true and A is 
true which is more normal than any world at which E is true and A is false” (ibid., 
42). Smith works out the technical details of normal worlds using not only David 
Lewis’s method of nearby possible worlds, but also Wolfgang Spohn’s ranking the- 
ory (Spohn 2012). See footnote 22 for Goldman on normal worlds. 

67 Smith 2010, 16. Cf. Smith 2016, 39-40, 128. 
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we do not know how the sequence will behave in the long run. This problem, 
however, no longer exists in the story about Bruce, for there we know whether 
we are dealing with normic or with evidential support. The reason that we 
have this knowledge is that we are being told how the sequence looks in the 
long run: 


In believing that [my] laptop is displaying a blue background, [Bruce is] ac- 
tually running a higher risk of error that I would be in believing the same 
thing ...If this set-up were replicated again and again then, in the long run, 
[Bruce’s] belief about [my] laptop would turn out to be false more often than 
my belief about my laptop.® 


If a problem remains, then it is a different one. It is that the normic support 
that Ey gives to A is in fact lower than the evidential support that E, gives 
to A. In other words, the inequality P(A|Eı) > P(A|E>) is not just apparent, 
based on a finite sequence of observations, but it persists in the long run 
— moreover we know that it does. Nonetheless, Smith advises us to base 
our belief in A on E> rather than on Eı. For only the E>-A relationship is 
a relationship of normic support, since only Er can, according to Smith, be 
said to justify A. 

We must confess that we have difficulty understanding this. Why base our 
belief on evidence which is less effective? Why rely on someone’s perceptual 
information if we know that his eyesight is poor and our own information is 
more reliable? Smith’s answer will be that in this case the poorer and less 
effective evidence is normically stronger. But what good is normic support 
that contradicts evidential support, not just now but also in the future? What 
is the sense of normic support as part of epistemic justification when it fun- 
damentally disagrees with evidence we know to be true in the long run? 

Of course, it might happen that we do not understand why P(AlEı) is 
greater than P(A|E>). But if we know that it is greater, should we not take that 
fact seriously? And does ‘taking seriously’ not mean that we act on E; rather 
than E>? Note that if we do not act on E; in this case, a merciless opponent 
could use us as a money pump. And the fact that the rune of normic support is 
flaunted on our fluttering banner will not prevent us from becoming paupers 
in the fullness of time. 

Smith himself makes no attempt to downplay these qualms: 


It may be rather tempting, however, for one to simply disregard such judge- 
ments [i.e., to trust E rather than E] as confused or naive. Perhaps we are 


68 Smith 2016, 35. our emphasis. We have adapted the example so that it fits the 
example that Smith describes in Smith 2010. 
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simply accustomed to relying upon perception when it comes to such matters 
and suspending any scruples about its fallibility. Once we do reflect carefully 
upon the fallibility of perception, so this thought goes, these troublesome in- 
tuitions are exposed as a kind of groundless prejudice. I’m not entirely con- 
vinced that this is the wrong thing to say — but I strongly suspect that it is. 


Note that we are not suggesting that justification is the same as evidential 
support. We agree with Smith that it is not, and Smith’s many examples are 
convincing illustrations of this standpoint.’® Let us be quite clear: we are 
not defending the risk minimization picture of justification. We do not think 
that justification can be defined as evidential support exceeding a certain 
threshold, nor are we saying that A is more justified by E; than by E> if 
the former gives more evidential support to A than the latter. Our claim is 
much weaker. We only maintain that probabilistic support, understood as the 
condition (2.1) and not to be confused with evidential support (see footnote 
59), is a necessary condition of justification. 

Probabilistic support is however not sufficient. Something has to be added 
to probabilistic support in order to turn it into justification. What is this 
‘something’? We do not know, but Smith thinks it is ‘normalcy’, that is the 
property that the support is normic. According to what he calls “the normic 
theory of justification”, normalcy is necessary and sufficient for justification, 
but according to “the hybrid theory”, it is only necessary.”! It is not entirely 
clear which of the two theories Smith would finally choose; but as we have 


6 Smith 2016, 36. 

70 Some examples are about cases in which P(A|B) > P(C|D), where all four vari- 
ables are different. For instance, A = ‘I will not win the lottery’, B = ‘I have bought 
one ticket in a fair lottery with a million tickets’, C = “The person in front of me 
will not suddenly drop dead’, D = “The person in front of me is young and healthy’. 
Then, even if P(A|B) > P(C|D), it is still true, according to Smith, that D normi- 
cally supports C, while B only makes A very likely (Smith 2010, 23). We believe 
that these cases do not provide much insight into the concept of justification, since 
they typically involve a comparison between two totally different domains. 

Here is another example by Smith (from his talk ‘When does evidence suffice 
for conviction?’ on 30 April 2014 in Groningen; cf. Cohen 1977 and Nesson 1979). 
Imagine a hundred people walking out from an electronics store, each carrying a 
television. As it turns out, only one television has been paid for, so ninety-nine were 
stolen. Since Joe was one of the hundred, the probability that he is a thief is 0.99. 
Are we justified to believe that Joe is a thief? Not so, says Smith. Coos Engelsma 
proposed that epistemically we are justified, but not morally (private communica- 
tion). Engelsma might have a point here, although this does not help the judge, who 
still has to weigh epistemic and moral justification against one another. 

7! Smith 2016, 76-79. 
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seen above there are cases in which normic support is inconsistent with ev- 
idential support. Hence we cannot have both as necessary conditions. What 
about probabilistic support? Can that be combined with normalcy? In fact it 
is possible, along the lines that Smith describes, to find cases where normic 
support also clashes with probabilistic support. Indeed, the example of Bruce 
and the computer can be tweaked to yield such a clash in the following way. 

Suppose that it is I, and not Bruce, who sometimes observes my com- 
puter’s screen. An evil hypnotist has however caused me to forget about the 
random generation of the background colour whenever I actually do observe 
the screen, but to remember how I programmed the boot routine when I am 
not looking at the screen. Now E> (the proposition that I see the colour to 
be blue) is true iff ~E; is true, the proposition that I do not know about 
the random generator. In repeated boot sequences, P(A|E;) > P(A|E2) be- 
comes P(A|SE>) > P(A|E2). Since it is Ey that gives normic support to A, 
we have thereby constructed an inconsistency between normic and proba- 
bilistic support. Other less far-fetched examples are doubtless possible. And 
if probabilistic support and normic support à la Smith are not consistent, one 
or the other has to be rejected. The foregoing has made clear where our al- 
legiance lies: we believe that it does not make sense to say that E justifies 
A without assuming that P(A|E) > P(A|7E). Thus, when in the rest of this 
book we talk about ‘justification’ or “epistemic justification’ or ‘probabilis- 
tic justification’, we will always mean ‘probabilistic support plus something 
else’. The indispensable röle of probabilistic support as the inequality (2.1) 
will again become clear in Section 5.3, where we propose our view of justi- 
fication as a trade-off. 


2.6 Alston’s Epistemic Probability 


The second objection to our view is based on arguments by William Alston. 
As Alston sees it, Kolmogorov probability falls short of analyzing crucial 
epistemological issues. Although he in no way wishes to make light of the 
importance of probability, he believes that the Kolmogorovian rendering of 
it is deficient, not of course for understanding justification (since, according 
to him, there is no such thing), but as an instrument for analyzing epistemic 
desiderata. Especially the desiderata that Alston deems to be “the most fun- 
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damental ones”, namely the so-called truth-conducive desiderata, rely heav- 
ily on concepts that have to do with probability. ”? 

Alston lists five desiderata that are truth conducive, of which the first two 
are especially interesting for us: 


1. The subject has adequate evidence (reasons, grounds ...) for the belief 
(Aj). 


2. A; is based on adequate evidence (reasons, grounds ... )./° 


The idea is that 1 is primarily about logical relations between propositions, 
while 2 is about the basing of beliefs. Alston deems 2 epistemically more 
desirable than 1, because he sees 2 as the actualization of the possibility pro- 
vided by 1, and “the possibility of something desirable is less desirable than 
its realization.”’* He further notes that one could think of the basing relation 
as being causal in character, as long as one realizes that it is a special kind of 
causality, namely “the kind involved in the operation of input-output mecha- 
nisms that form and sustain, and so on, beliefs in a way that is characteristic 
of human beings.” 

What does the term ‘adequate’ in 1 and 2 mean? Alston states that, if 1 
and 2 are to be epistemic desiderata, “adequacy must be so construed that 
adequate evidence ... for A; entails the probable truth of A;.”’° In a further 
attempt to explain the meaning of ‘adequate’ he writes: 


The initial intuitive idea is that the ground is an indication that the belief is 
true, not necessarily a conclusive indication for that, but at least something 
that provides significant support for taking it to be true. Thus it is natural to 
think of an adequate ground of a belief A; as something such that basing A; on 
it confers a significant probability of truth on Aj.”7 


Alston stresses — and this is the salient point here — that his use of the word 
‘probability’ deviates from the word as it occurs in a Kolmogorovian context. 
Probability for Alston is, as he dubs it, “epistemic conditional probability’ or 
for short “epistemic probability’. It is subject to three constraints: it applies 
to beliefs, it is to be understood as ‘the probability that a belief is true’, and it 


72 Alston 2005a, 81. 

73 Ibid., 43, 81. Here and elsewhere we have replaced Alston’s B by Aj. 
74 Tbid., 90. 

75 Tbid., 84. 

76 Tbid., 43. We have added the last italics. 

77 Tbid., 94. 
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is essentially conditional.’® As such these three constraints do not yet seem 
to carve out a non-Kolmogorovian probability, but Alston highlights three 
ways in which his epistemic conditional probability “fails to coincide with 
conditional probability as typically treated in probability theory.””” Below 
we will discuss all three of them, arguing that none of them conflicts with 
orthodox probability theory. 

The first way in which Alston’s epistemic probability is supposed to devi- 
ate from standard probability theory hinges on the difference between dox- 
astic and nondoxastic (primarily experiential) grounds of belief: 


First look at doxastic grounds. Suppose S’s belief that Susie is planning to 
leave her husband (Aj) is based on S’s belief that Susie told her close friend, 
Joy, that she was (A j). To decide how strong an indication the belief that A; is 
of the truth of the belief that A;, we have to look at two things. First, if we stick 
for the moment as long as possible with the treatment in terms of propositions, 
the relation between the propositions that are the contents of these beliefs, A ; 
and A;, is one factor that influences the conditional probability of A; on Aj. 
But, second, we have to look at the epistemic status of the belief that A ;. For 
even if the conditional probability of A; on A; is high, that won’t put S in a 
strong epistemic position in believing that of A; if S has no good reason, or 
not a good enough reason, to believe that A ;. This consideration is sufficient 
to show that where the ground is doxastic the adequacy of the ground is not 
identical with the conditional probability of the propositional content of the 
target belief on the propositional content of the grounding belief. 

With nondoxastic grounds, on the other hand, we are not faced with this 
second factor. Where my ground is a certain visual experience rather than, for 
example, a belief that I have that experience, the ground is a fact rather than 
a belief in a fact. Hence no problem can arise with respect to the epistemic 
status of the ground since that ground is not the sort of thing that can have 
an epistemic status. And so the adequacy of a nondoxastic ground coincides 
exactly with the conditional probability of the propositional content of the 
belief in that fact, construed as a true proposition. Here conditional probability 
as treated in probability theory can translate directly into an epistemic status.®° 


Here Alston suggests that, if the ground A; is a belief, then standard proba- 
bility theory will look only at the conditional probability of the target belief 


78 Ibid., 95. Alston maintains that “conditional probabilities are in the center of the 
picture for the epistemology of belief” (ibid.). We fully agree with him here. After 
all, our condition of probabilistic support is made up of conditional probabilities, 
and our view that epistemic justification is intrinsically relational (see Section 2.2) 
also accommodates that point. 

7? Ibid., 97. 

80 Thid. 
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Aj, given the grounding belief A j: 
P(A:lA)). 


In contrast, his own epistemic probability also accounts for the epistemic 
status of A; itself, P(A j), so that we would obtain 


P(A:|A;)P(Aj). 


Alston’s suggestion is however incorrect. Standard probability accounts not 
only for the epistemic status of A;, namely P(A,), but also for the epistemic 
status of ~A j, namely P(—A;). The latter is as important as the former. For 
if the probability of A; is conditioned on the probability of A j, then one can 
only calculate the former probability if one also takes into account what that 
probability would be in case A; is false. In standard probability theory the 
fact that the probability of A; is conditioned by the probability of A; is ex- 
pressed by the rule or law of total probability: 


P(A:) = P(AilAj)P(Aj) + P(Ai|-A;)P(A;). (2.4) 


So it seems that Alston’s mistake is twofold. First he incorrectly suggests that 
standard probability theory does not consider P(A ;), and second he himself 
neglects the relevance of the second term in (2.4). As we will see in the 
next chapter, the erroneous neglect of the second term of the rule of total 
probability is a mistake that has occurred more often in philosophy; even 
such notable scholars as Clarence Irving Lewis and Bertrand Russell fell 
prey to it. 

Formula (2.4) also enables us to understand better what Alston says about 
the case in which the ground A; is not a belief. Alston correctly notes that, 
if the ground is nondoxastic, it is a fact rather than a belief in a fact. This 
fact, “construed as a true proposition”, has probability 1, so P(A;) = 1. This 
means that P(—A;) = 0, and thus that (2.4) reduces to the “conditional prob- 
ability as treated in probability theory”, viz. 


in accordance with what Alston states. 

The second respect in which Alston’s epistemic probability allegedly fails 
to conform to the standard probability calculus concerns the relevance that 
the ground A ; has to the target A;. Alston illustrates his point by referring to 
the case where A; is a necessary truth or necessary falsehood: 
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In the standard probability calculus the probability of every necessary truth 
is 1 and the probability of every necessary falsehood is 0. This makes it im- 
possible to use conditional probabilities in assessing the adequacy of grounds 
for necessarily true or false beliefs. Since every necessary truth has a proba- 
bility of 1, no matter what else is the case, its conditional probability on any 
proposition whatever is 1. This ‘rigidity’ of the probability of necessary truths 
prevents it from capturing what we are after in thinking of the adequacy of 
the grounds. One who supposes that a person who believes 2 + 2 = 4 on the 
basis of the belief that all crows are black, thereby believes the former on the 
basis of a significantly adequate ground, is missing the epistemological boat. 
In thinking of a ground as adequate to some considerable degree, we take it 
to render what it grounds as more or less probable. It must make a significant 
difference to the probability of the grounded belief. ...The axioms of arith- 
metic are adequate grounds for 2 + 2 = 4, unlike the proposition that all crows 
are black. But this will have to be explained on the basis of some other than 
the probability calculus.®! 


In this passage, Alston is criticizing the standard probability calculus on two 
points. First, in order for the ground A; to be adequate (or relevant) for the 
target A;, A; must render A; more or less probable. But if A; is a necessary 
truth or falsehood, then standard probability implies that A; will not render 
A; more or less probable. Thus A ; will not be adequate or relevant to A;, and 
this is counterintuitive. Second, whether A ; renders A; more or less probable, 
and thus whether A; is (ir)relevant to A;, will have to be determined outside 
the probability calculus, and this is troublesome. 

What to make of these two points? Let’s begin with the first one. We will 
follow Alston in describing an adequate or relevant ground as one that makes 
the target more of less probable; this in fact sits well with our condition 
of probabilistic support. Alston is right that in standard probability theory, 
if the target A; is a tautology or a contradiction, then the ground A; will 
not render A; more or less probable, and will in that sense be irrelevant to 
the target. But why call this counterintuitive or otherwise problematic? It 
would be stranger, and even inconsistent, if something that already had the 
maximum probability value would acquire an even higher one. 

Moreover, calling A; a tautology or a contradiction (necessary truth or 
falsehood) already presuposes a system in which A; has that character. That 
is, if Aj is 2 + 2 = 4, then A; is a tautology relative to any system equivalent to 
the axioms of arithmetic. So if A; is such a system, then P(A;|A;) = 1. The 
situation remains the same if we add to A; the proposition that all crows are 
black: P(A;|A; and all crows are black) = 1. What if the ground A ; consists 


81 Ibid., 97-98. 
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only of the proposition that all crows are black and nothing more, in particu- 
lar nothing that is equivalent to the axioms of arithmetic? In that case A; and 
A; are independent of each another, 


P(A; \Aj) = P(Ai)P(Aj), 


which implies that P(A;|A;) is equal to P(A;). Here A; is also irrelevant to 
A; (and vice versa), since A; does not render A; more or less probable. How- 
ever, A; is now irrelevant for a completely different reason: it is irrelevant, 
not because it already confers upon A; the maximum probability value, but 
because it is independent of Aj. 

Alston’s second point of criticism is that we have to go outside probability 
theory in order to establish whether a ground is relevant to a target: we can 
only explain that Peano arithmetic is relevant to 2 + 2 = 4, and that ‘crows 
are black’ is not, on some other basis than the probability calculus. We think 
there is a confusion here. A comparison with standard logic might help. Ask 
yourself: is A; relevant to A; in the sense that A; makes A; true (rather than 
probable)? The answer to this question depends first and foremost on what 
Aj; and A; mean. Let A; mean ‘Feike can swim’.®? If A j means ‘Feike is a 
Frisian and all Frisians can swim’, then A; is clearly relevant to A;, and ‘if A; 
then A,’ expresses a logical connection. The situation is the same in standard 
probability theory. If A; means ‘Feike is a Frisian and 9 out of 10 Frisians 
can swim’, then A, is obviously relevant to A;, and P(A;|A;) = 0.9 expresses 
a logical connection. Of course, whether A, is true can only be established 
outside logic or probability theory: we need empirical information in order 
to determine whether all Frisians, or only 9 out of 10, can swim. The fact that 
both in logic and in probability theory we often need the world to determine 
whether premises are true is a fact of life, it is a deficiency neither of logic 
nor of probability theory. 

The third way in which, according to Alston, epistemic conditional prob- 
ability fails to coincide with conditional probability, as treated in probabil- 
ity theory, has to do with the basing relation. Whereas ordinary conditional 
probability is typically concerned with relations between propositional con- 
tents, Alston’s conditional probability focuses on relations between beliefs 
and their grounds; in contrast to the former, the latter are basing relations. In 
the previous pages we have argued that this difference between the two re- 
lations can be seen as a difference between interpretations of the probability 
calculus: logical or conceptual in the one case, and nomological or causal in 


82 We borrow the example from Alston, who in turn has borrowed it from Alvin 
Plantinga (ibid., 105). 
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the other. This would mean that, for Alston, A; is the ground on which A; 
is (probabilistically) based. However, this appears to be too simple, and not 
quite in accordance with what Alston writes. Alston does not say that the 
belief A; is based on a ground, but that: 


the basing of the belief [A;] on the ground ...is the condition on which the 
probability of the belief [Aj] is conditional.®? 


So rather than saying that the condition A; is the ground of A;, Alston appears 
to say that A; is the condition of being based on a ground. However, even 
that reconstruction might not be what Alston has in mind, for he writes: 


... that on which the probability of the target belief, Aj, is conditional dif- 
fers in the two cases [the case of Alston’s conditional probability and that of 
ordinary conditional probability]. For the latter, it is the conditioning propo- 
sitions, taken as true. For the former, it is the basing of A; on a ground of a 
certain degree of adequacy.®* 


Thus A; is the condition of being based with a certain degree of adequacy 
on a ground. Alston continues: 


And that degree of adequacy is a function of more than the relation of propo- 
sitional contents. As we have seen, it is also a function of the epistemic sta- 
tus of any beliefs in the ground. So in addition to the difference between a 
proposition-proposition(s) relationship and a belief-ground relationship, even 
the factors relevant to the status of the conditioning item(s) do not exactly 
match.*> 


Here Alston suggests that the difference between the two kinds of condi- 
tional probability, the traditional and the Alstonian one, is that only the lat- 
ter accounts for P(A ;). However, we have seen that this is not so. When A; 
is conditioned on Aj, traditional probability theory gives the probability of 
A; via the rule of total probability (2.4), and (2.4) clearly also accounts for 
P(A;). 

We conclude that Alston’s concept of epistemic conditional probability 
actually coincides with the concept of conditional probability as it is treated 
in traditional probability theory. None of the three alleged differences that 
Alston describes constitutes a departure from Kolmogorovian orthodoxy. 


83 Tbid., 98. 
84 Thid., 99. 
85 Ibid. 
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Chapter 3 
The Probabilistic Regress 


Abstract 

During more than twenty years Clarence Irving Lewis and Hans Reichen- 
bach pursued an unresolved debate that is relevant to the question of whether 
infinite epistemic chains make sense. Lewis, the nay-sayer, held that any 
probability statement presupposes a certainty, but Reichenbach profoundly 
disagreed. We present an example of a benign probabilistic regress, thus 
showing that Reichenbach was right. While in general one lacks a criterion 
for distinguishing a benign from a vicious regress, in the case of probabilis- 
tic regresses the watershed can be precisely delineated. The vast majority 
(‘the usual class’) is benign, while its complement (‘the exceptional class’) 
is vicious. 


3.1 A New Twist 


The previous chapter indicated how intricate the debate about epistemic just- 
ification has become. A mixed bag of knotty details and drawbacks compli- 
cates the subject, giving rise to a variety of different positions. But although 
nobody knows what exactly epistemic justification is, the idea that it involves 
probabilistic support is widespread among epistemologists of all sorts and 
conditions. Internalists, externalists, foundationalists, anti-foundationalists, 
evidentialists and reliabilists: most of them assume that ‘A ; justifies A;’ im- 
plies that A; somehow receives probabilistic support from Aj. 

In this chapter and the ones to follow we want to make clear how sig- 
nificant this turn towards probability actually is, and what surprising conse- 
quences it has. The debate about epistemic regresses acquires a completely 
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new twist when Kolmogorovian probability is brought into the picture; for 
as we will see a probabilistic regress turns out to be immune to many of the 
objections that have routinely been raised against the traditional regress of 
entailments. The situation is to a certain extent reminiscent of the two causal 
regresses that we encountered in Chapter 1. Whereas a causal series per se 
only makes sense if it has a first member, this is not so for a causal series 
per accidens. Similarly, as we will argue, a traditional regress of entailments 
needs a first member, but a regress of probabilistic support may not. 

In the present chapter we will describe the concept of a probabilistic 
regress, that is a regress in which (1.1) of Chapter 1, 


q < A] 4 A2 4 Az 4 Ay... 


is reinterpreted as: q is probabilistically supported by A,, which is proba- 
bilistically supported by Ao, and so on, ad infinitum.' It is assumed that every 
link in this chain satisfies the condition of probabilistic support (2.1). As we 
have seen, this condition is quite weak, falling considerably short of the title 
‘justification’. But for our purposes this minimal requirement is enough. 

Our exposition of a probabilistic regress takes as its starting point a his- 
torical debate between Hans Reichenbach (1891-1953) and Clarence Irving 
Lewis (1883-1964). Lewis and Reichenbach are both early defenders of the 
view that epistemic justification is probabilistic in character, holding that A ; 
might justify A; even if the former does not logically entail the latter but only 
provides probabilistic support. They disagree, however, as to the implica- 
tions of this claim. Lewis insists that probabilistic justification must spring 
from a ground that is certain, whereas Reichenbach maintains that proba- 
bilistic justification remains coherent, even if it is not rooted in firm ground. 
The disagreement between Lewis and Reichenbach extended over more than 
two decades, from 1930 until 1952, and it is well documented in letters and 
in journal contributions. 

In Sections 3.2 and 3.3 we will give an overview of the dispute. We first 
describe Lewis’s main claim, viz. that any proposition of the form ‘q is prob- 
able’ or ‘q is made probable by A,’ must presuppose a proposition that is 
certain. Lewis’s argument for this claim is that without such a presupposi- 
tion we will end up with a probabilistic regress that has the absurd conse- 
quence of always yielding probability value zero for g. Next we describe 
Reichenbach’s objection to this argument. We then explain that Lewis is not 
convinced by it and challenges Reichenbach to produce a counterexample, 


! The term ‘probabilistic regress’ was coined by Frederik Herzberg (Herzberg 
2010). 
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i.e. a probabilistic regress that yields a number other than zero for the target 
proposition q. 

Reichenbach never took up Lewis’s challenge, but we will meet it in Sec- 
tion 3.4. By presenting a probabilistic regress that converges to a non-zero 
limit, we demonstrate that a target can have a definite and computable value, 
even if it is probabilistically justified by a series that continues ad infinitum. 
In this manner we show that Reichenbach rather than Lewis was correct, and 
also that a probabilistic regress can make sense. 

The counterexample to Lewis in Section 3.4 has a simple, uniform struc- 
ture. In Section 3.5 we offer a nonuniform and thus more general counterex- 
ample. Both counterexamples belong to what we call ‘the usual class’, i.e. 
the class of probabilistic regresses that yield a well-defined probability for 
the target proposition. We distinguish it from ‘the exceptional class’, which 
contains the probabilistic regresses that are not well-defined. In Section 3.6 
we will spell out the conditions for membership of the usual and the excep- 
tional classes. As it turns out, exceptional probabilistic regresses are charac- 
terized by the fact that here probabilistic support comes very close to entail- 
ment. Not surprisingly, therefore, probabilistic regresses in the exceptional 
class need a ground in order to bestow a value on the target, and in that sense 
count as vicious. 

The uniform and the nonuniform counterexamples in 3.4 and 3.5 are rather 
abstract in nature; but in Section 3.7 we offer two real-life probabilistic re- 
gresses, based on the development of bacteria. 


3.2 The Lewis-Reichenbach Dispute 


In 1929 Lewis published his first major work, Mind and the world order. 
An outline of a theory of knowledge.” Here he starts from the traditional 
view that our knowledge is partly mathematical and partly empirical. The 
mathematical part deals with knowledge that is a priori and analytic; the 
empirical part concerns our knowledge of nature. This knowledge of nature, 
says Lewis, is always only probable: 


... all empirical knowledge is probable only ...our knowledge of nature is a 
knowledge of probabilities.° 


? The present section is based on Peijnenburg and Atkinson 2011. 
3 Lewis 1929, 309-310. 
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Since the crucial issue for any theory of knowledge is the character of em- 
pirical knowledge, it follows that 


... the problem of our knowledge ...is that of the validity of our probability 
judgements.* 


What about the validity of probability statements? In Mind and the world 
order, Lewis stresses time and time again that probability judgements only 
make sense if they are based on something that is certain: 


The validity of probability judgements rests upon ...truths which must be 
certain.’ 


... the immediate premises are, very likely, themselves only probable, and per- 
haps in turn based upon premises only probable. Unless this backward-leading 
chain comes to rest finally in certainty, no probability-judgment can be valid 
at all.® 


Lewis is not the only philosopher who has argued that probability judge- 
ments presuppose certainties. The idea can already be found in David Hume’s 
Treatise of human nature and it has also been defended by, among others, 
Keith Lehrer, Richard Fumerton, and Nicholas Rescher.’ Lewis is however 
one of the few who discusses the claim in more detail. His explanation can 
be summarized as follows. 

A statement of the form ‘q is probable’ or ‘the probability of q is x’ is in 
fact elliptical for ‘q is probable, given A|’, or ‘the probability of q given A, is 
x’, where x is a number between one and zero. In symbols: the unconditional 
P(q) = x is elliptical for the conditional P(g|Aı) = x. In many cases, A is 
itself only probable, so we obtain “A, is probable’, which is shorthand for ‘A, 
is probable, given Ay’. Again, if Az is only probable, we need A3, et cetera. 
A probabilistic regress threatens. Lewis’s claim is that in the end we must 
encounter a statement, p, that is certain (or has probability 1 — we will not 
distinguish here between these two cases): 


q <— A, <— Aa +— A; +— Ag +— ... +— Pp. 


Denying that this is so, and claiming that such a certain p is not needed, 
says Lewis, amounts to making nonsense of the original statement (‘q is 


4 Ibid., 308. 

5 Ibid., 311. 

é Ibid., 328-329. 

7 Hume 1738/1961, 178; Lehrer 1974, 143; Fumerton 2004, 162; Fumerton and 
Hasan 2010; Rescher 2010, 36-37. 
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probable’) itself. Thus we can only give a probability value to a target, q, if 
we suppose that there is a ground or foundation, p, that is certain.°. 

Reichenbach read Mind and the world order soon after it came out. Al- 
though he concurred with many of Lewis’s reasonings, he profoundly dis- 
agreed with the claim that probability statements only make sense if they 
are based on certainties. On July 29, 1930, he sent Lewis a letter, enclosing 
some of his own manuscripts. Unfortunately this letter is now lost. We only 
know of its existence from a reply that Lewis wrote to Reichenbach, dated 
August 26, 1930.? We are unable to infer from this reply what exactly Re- 
ichenbach had written, since Lewis mainly writes about the manuscripts that 
Reichenbach had sent him.!° 

Between 1930 and 1940 a correspondence developed, which was partly 
about practical matters (Reichenbach had fled Berlin in 1933 and went to Is- 
tanbul, from where he tried to find an academic position in the U.S.A.), and 
partly about Lewis’s claim that probability judgements presuppose certain- 
ties. As far as the latter is concerned, it is clear that Reichenbach’s arguments 
did not convince Lewis, for sixteen years later, in his book An analysis of 
knowledge and valuation, Lewis stresses the same point again: 


If anything is to be probable, then something must be certain. The data which 
themselves support a genuine probability, must themselves be certainties. !! 


The disagreement between Lewis and Reichenbach reached its height in De- 
cember 1951, at the forty-eighth meeting of the Eastern Division of the 
American Philosophical Association at Bryn Mawr. At that meeting there 
was a symposium on “Ihe Given’, where Lewis, Reichenbach and Nelson 
Goodman read papers. Their contributions were published a year later in 
The Philosophical Review, and there we learn that Lewis sticks to his guns: 


8 As James Van Cleve has noted, Lewis’s text appears to be ambiguous between two 
readings (Van Cleve 1977, 323-324). According to the first, Lewis says something 
like: ‘The probability of g given p is x, and moreover p is certain’. In symbols: 
P(q|p) = x and P(p) = 1. According to the second reading he says: ‘It is certain 
that the probability of q given p is x’, that is P(P(q|p) =x) = 1. It can however be 
proven that the two readings are equivalent, so this ambiguity is merely apparent. 
We will come back to this matter in Chapter 7. 

? “Your very kind letter of July 29th has reached me, here at my summer address.” 
The summer address was, by the way, Briar Hill in New Hampshire, close to Ver- 
mont. 

10 And apparently did not know quite what to do with them: “I find difficulty in 
understanding the ground from which they arise.” 

|! Lewis 1946, 186. 
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The supposition that the probability of anything whatever always depends on 
something else which is only probable itself, is flatly incompatible with the 
assignment of any probability at all. !? 


But Reichenbach, too, insisted on his own views. Already in his major epis- 
temological work, Experience and prediction, he had found an apt metaphor 
for his anti-foundationalist position: 


All we have is an elastic net of probability relations, floating in open space.!? 


Fifteen years later Reichenbach still had the same conviction. He calls the 
claim of Lewis that probabilities must be grounded in certainties “just one 
of those fallacies in which probability theory is so rich”.!* In an attempt to 
understand the root of the fallacy he writes: 


We argue: if events are merely probable, the statement about their probability 
must be certain, because ... Because of what? I think there is tacitly a concep- 
tion involved according to which knowledge is to be identified with certainty, 
and probable knowledge appears tolerable only if it is embedded in a frame- 
work of certainty. This is a remnant of rationalism.!> 


And being a rationalist would of course be a thorn in Reichenbach’s logical- 
empiricist side. Lewis, in turn, rejects the accusation of being an old fash- 
ioned rationalist and replies that, on the contrary, he is trying to save em- 
piricism from what he calls ‘a modernized coherence theory’ like that of his 
opponent. He writes: 


...the probabilistic conception [of Reichenbach] strikes me as supposing that 
if enough probabilities can be got to lean against one another they can all be 
made to stand up. I suggest that, on the contrary, unless some of them can 
stand up alone, they will all fall flat.!° 


Who is right in this debate? Some authors, such as James Van Cleve and 
Richard Legum, have argued that it is Lewis.'’ To explain why we dissent, 
we will first spell out the argument that Lewis puts forward in support of 
his claim that probability judgements presuppose certainties. It is true that 
the negation of Lewis’s claim leads to an infinite regress, but since not all 
regresses are vicious, an argument is required in order to show that this par- 
ticular regress is of the unacceptable kind. 


!2 Lewis 1952, 173. 

13 Reichenbach 1938, 192. 

14 Reichenbach 1952, 152. 

15 Thid. 

16 Lewis 1952, 173. 

17 Van Cleve 1977; Legum 1980. 
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3.3 Lewis’s Argument 


As Mark Pastin correctly notes, the claim that probabilities presuppose cer- 
tainties was repeated by Lewis “throughout his writings but [he] gave little 
attention to defending it”.'® The most extensive defence can be found in 
Mind and the world order, which contains the following argument: 


Nearly all the accepted probabilities rest upon more complex evidence than 
the usual formulations suggest; what are accepted as premises are themselves 
not certain but highly probable. Thus our probability judgement, if made ex- 
plicit, would take the form: the probability that A is B is a/b, because if C 
is D, then the probability that A is B is m/n, and the probability of ‘C is D’ 
is c/d (where m/n x c/d = a/b). But this compound character of probable 
judgement offers no theoretical difficulty for their validity, provided only that 
the probability of the premises, when pushed back to what is more and more 
ultimate, somewhere comes to rest in something certain. !? 


In other words, Lewis says that the judgement 


Ais B is probable, (3.1) 


is elliptical for 


Ais B is probable, given C is D. (3.2) 


Since we are dealing with empirical knowledge, C is D is itself also only 
probable. The judgement ‘C is D is probable’ is in turn elliptical for ‘C is D 
is probable, given E is F’. And so on. 

We can formalize and quantify (3.1) and (3.2) by 


P(A is B) =a/b (3.3) 
which is elliptical for 


P(A is B) = P(A is B|C is D) x P(C is D) 
=m/nxc/d 
= a/b, (3.4) 


where a/b, m/n and c/d are probability values between 1 and 0. Now of 
course the probability that C is D may also be elliptical. If this series were to 


18 Pastin 1975, 410. 
19 Lewis 1929, 327-28. Here ‘A is B’ means something like ‘all A-things are B- 
things’. We have replaced Lewis’s ‘P is Q’ and ‘p/q’ by ‘C is D’ and ‘c/a’. 
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go on and on, then, because all the factors in the multiplication are probabil- 
ities and thus positive numbers less than one, the probability of the original 
proposition A is B would always tend to zero. But this is ridiculous, so the 
series of probability judgements must come to a stop in a statement that is 
certain. This is Lewis’s argument for his claim that bestowing a probabil- 
ity value on a target presupposes the acceptance of a ground that is certain: 
without such a ground, the probability of the target will go to zero. 

Lewis’s argument is however simply mistaken. For P(A is B) is not ellip- 
tical for the product P(A is B|C is D) x P(C is D), but for the following sum 
of products: 


P(A is B) = P(A is B|C is D) x P(C is D) 
+P(AisBl-(CisD))xP(-(CisD)). (3.5) 


The first term of (3.5) coincides with (3.4), but (3.5) contains a second term, 
which Lewis forgets. He ignores the fact that, if the probability of A is B 
is conditioned by the probability of C is D, then you can only calculate the 
former probability if you also take into account what that probability is in 
case C is D is false.” Eq.(3.5) is an instance of the rule of total probability, 
which is a theorem of the calculus that Andrey Kolmogorov developed in his 
Grundbegriffe der Wahrscheinlichkeitsrechnung. 

Kolmogorov published his Grundbegriffe in 1933, which might explain 
Lewis’s mistake. The same can however not be said of Bertrand Russell. 
In 1948, nineteen years after Mind and the world-order, Russell published 
Human knowledge: its scope and limits. Part 5 of this book is devoted to 
the concept of probability, and there Russell criticizes several theories of 
probability, including Reichenbach’s theory in his Wahrscheinlichkeitslehre 
of 1935. It is interesting that, quite independently of Lewis (for he does not 
mention him anywhere), Russell claims that attributing a probability value to 
a proposition presupposes a certainty. Moreover, he defends this claim with 
the same erroneous argument that Lewis had used. Russell writes: 


At the first level, we say that the probability that an A will be a B is mı /nı; at 
the second level, we assign to this statement a probability m2/n2, by making 
it one of some series of similar statements; at the third level, we assign a 


20 Mark Pastin seems to interpret Lewis as talking about the probability of the con- 
junction of the propositions ‘A is B’ and ‘C is D’ (Pastin 1975, 413). In this read- 
ing, Eq.(3.5) would be replaced by P((A is B) and (C is D)) = P(A is B|C is D) x 
P(C is D), and this expression has no second term. However in this case there would 
not be a justificatory chain in which one proposition justifies the other. See footnote 
31. 
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probability m3 /n3 to the statement that there is a probability m/n2 in favour 
of our first probability mı /n,; and so we go on forever. If this endless regress 
could be carried out, the ultimate probability in favour of the rightness of our 
initial estimate mı /nı would be an infinite product 


m/m x m3/n3 x ma/na bya 
which may be expected to be zero.?! 
In other words, Russell argues that a series of statements like 


sı =AisB 
s2 = The probability of sı is mı /nı 
s3 = The probability of s2 is ma /na 


implies that the probability of sı will tend to zero.”” The argument is the 
same as that of Lewis: the probability of sı is the outcome of the multiplica- 
tion of an infinite number of factors each of which is smaller than 1. It thus 
fails for precisely the same reason as does Lewis’s argument. If a proposition 


21 Russell 1948, 434; our italics. Where Russell has œ and B we have used A and 
B. It is assumed that 0 < m;/n; < 1 for all i. Presumably Russell, a competent 
mathematician, wrote ‘may be expected to be zero’ because he knew that there exist 
infinite products of factors, all less than one, that converge (i.e. that yield well- 
defined, non-zero values). In this connection it is interesting that Quine, in his 1946 
Lectures on David Hume’s Philosophy (Quine 2008), indeed makes the point that 
such a product can be convergent: in fact he gives an explicit example. He fails, 
however, to note that the point is irrelevant, for the probabilities in question should 
not be multiplied together (because of the second term in (3.5)). Thanks to Sander 
Verhaegh for bringing Quine’s lectures to our notice. We return to Quine’s reasoning 
in Chapter 7. 

22 Note that Russell here speaks about higher-order probability statements rather 
than about the probability of a reference class in a conditional probability statement 
(see footnote 8 for the difference). Russell says that such a series of higher-order 
probability statements “leads (one is to suppose) to a limit-proposition, which alone 
we have a right to assert. But / do not see how this limit-proposition is to be ex- 
pressed. The trouble is that, as regards all the members of the series before it, we 
have no reason ...to regard them as more likely to be true than to be false; they 
have, in fact, no probability that we can estimate.” (Russell 1948, 435; our italics). 
In other words, Russell suggests that we cannot attribute a probability value to sı 
because we are unable to compute the limit of the series. This seems to be at odds 
with his earlier claim that the value of sı goes to zero, but we will not dwell on 
the matter here. In the next section we will rather specify the limit proposition that 
Russell was vainly trying to express. 
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with probability x is conditioned by a proposition with probability y, then the 
probability of the first proposition is not given by xy, as Russell says, but by 
xy +x'(1— y), where x’ is the probability that the first proposition is true if 
the second is false, and (1 — y) is the probability that the second proposition 
is indeed false. Just like Lewis, Russell forgets the second term in the rule of 
total probability, namely x’ (1 — y). 

Reichenbach notices that Russell makes the mistake, and points it out to 
him in a letter of March 28, 1949.” Russell clearly acknowledges his over- 
sight, as we see from his reply three weeks later.” Lewis, on the other hand, 
seems to have persisted in his error, and Reichenbach confronts him with 
this fact in 1951, at the forty-eighth meeting of the American Philosophical 
Association at Bryn Mawr. Lewis appears however not to be impressed by 
Reichenbach’s amendment: 


...even if we accept the correction which Reichenbach urges here, I disbe- 
lieve that it will save his point. For that, I think he must prove that, where 
any regress of probability-values is involved, the progressively qualified frac- 
tion measuring the probability of the quaesitum will converge to some deter- 
minable value other than zero; and I question whether such a proof can be 
given.” 


In other words, Lewis fails to see the relevance of the second term in (3.5): he 
simply does not believe that an infinite regress of probabilities can converge 
to some value other than zero. Even if we do take Reichenbach’s amendment 
into account, Lewis still thinks that an infinite series of probability statements 
conditioned by probability statements will always converge to zero. And he 
defies Reichenbach to prove the contrary. As far as we know Reichenbach 
never took up the challenge. Perhaps he planned to, but never got around to 
it; or maybe he had difficulties finding what Russell called “the limit proposi- 
tion” (see footnote 22); or perhaps he simply got tired of the debate. We will 
presumably never know, for in April 1953 Reichenbach died in California of 
a heart attack. 


23 The letter is printed in the volume with selected writings of Hans Reichenbach 
edited by Maria Reichenbach and Robert Cohen (Reichenbach and Cohen 1978, 
405-411). 

24 «I perceive already that you are right as to the mathematical error that I commit- 
ted on page 416” (letter from Russell to Reichenbach, April 22, 1949). Page 416 
corresponds to page 434 in reprints of Russell’s book. We are grateful to Mr. L. Lu- 
gar and Ms. B. Arden of the Pittsburgh Archive for sending us a copy of Russell’s 
letter. 

25 Lewis 1952, 172. 
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In the next section we will take up Lewis’s gauntlet by presenting a coun- 
terexample to his argument that a “regress of probability-values” always 
tends to zero. This counterexample involves an infinite iteration of the rule 
of total probability. Although this iteration produces a much more compli- 
cated regress than the simple product that Russell and Lewis had envisaged, 
it leads to a perfectly well-defined, and moreover nonzero probability for the 
target proposition. It thus also produces the “limit-proposition” that Russell 
was looking for.?® 


3.4 A Counterexample 


Let our target proposition q be probabilistically justified by proposition A. 
We have seen that the unconditional probability of q, namely P(g), can be 
calculated from the rule of total probability: 


P(q) = P(q|A1)P(A1) + P(qlAı)P(Aı). (3.6) 


To make contact with Lewis’s argument, we can take q to be ‘A is B’ and A, 
to be ‘C is D’. If A, is probabilistically justified by Az, then P(A) can be 
calculated from another instance of the rule, 


P(A) = P(A1lA2)P(A2) +P(A1|7A2)P(“A2), (3.7) 


and if A> is in turn probabilistically justified by A3 we have to repeat the rule 
again, 


26 Dennis Dieks put forward the possibility that Lewis might have been interested 
only in those probabilistic regresses in which the second term may be legitimately 
ignored (Dieks 2015). Dieks’ suggestion is intriguing, but it causes difficulties. First, 
why did not Lewis make this explicit? In his debate with Reichenbach there appear 
to have been opportunities enough. Second, even if A, 1 has been called a reason for 
An, we should not overlook the fact that other propositions, contained in the negation 
of An+1, can well contribute to the justification of A„. As Johan van Benthem phrases 
it: “[P(An|7An+1)] measures intuitively the ‘bonus’ that A, receives even if Ay+ı 
were untrue. This inclusion might perhaps sound odd if we have just introduced 
An+1 as reason for A, — but we may, neither here nor in argumentation generally, 
ignore the fact that a postulated claim can already enjoy support without A„+1” (Van 
Benthem 2015, 148, our translation from the Dutch; cf. Peijnenburg 2015, 205-206). 
In any case, if Dieks were correct this would considerably restrict the domain in 
which the Lewisian approach could apply, and it would appear to be inconsistent 
with the probability calculus. 
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P(A2) = P(A2|A3)P(A3) + P(A2|=A3)P(A3). (3.8) 


Can we continue this repetition, thus allowing for propositions being prob- 
abilistically justified by other propositions, being probabilistically justified 
by still other propositions, ad infinitum? It might look as though we cannot. 
How would we ever be able to calculate P(q) if it is the outcome of an infinite 
regress of instances of the rule of total probability? The calculation seems at 
first sight to be too lengthy and too complicated for us to complete. After all, 
insertion of Eq.(3.7), together with 


P(7A1) = P(TAı |A2)P(A2) + P(~A1|=42)P(~A2) (3.9) 


into the right-hand side of Eq.(3.6) leads to an expression with four terms, 
namely: 
P(q) = P(4|A1)P(A1|A2)P(A2) + P(g|=Aı)P(A1|A2)P(A2)+ (3.10) 
P(qlAı )P(Aıl=A2)P(>A2) + P(q|7A1)P(>A1|7A2) P(7A2). 


A repetition of this manoeuvre to express P(A) and P(—A2) in terms of 
P(A3) and P(—A3) would produce no less than eight terms. After n + 1 steps, 
the number of steps is 2”*!, yielding an ungainly expression that seems hard 
to evaluate in a simple, closed form. 

There is however a way to reduce this complication of the rapidly in- 
creasing number of terms. In explaining this we first simplify the notation by 
abbreviating (3.6) by setting the two conditional probabilities, P(q|A;) and 
P(q|=Aı), equal to & and ß: 


a = P(q|A1) B=PlalAı). (3.11) 
Now P(q) becomes: 


aP(Aı)+BP(-A,) 
aP(A;) + B[l — P(A1)| 
= B+(a—B)P(Ai). (3.12) 


P(q) 


Clearly, we can only compute P(q) if we know P(Aı). Of course, we also 
have to know the values of the conditional probabilities & and ß. Their status 
is however rather different from that of the unconditional probabilities, and 
we will come back to this matter in detail in Chapter 4. At this juncture, 
we simply assume that & and ß are given, and that they are the same from 
link to link (the latter assumption is dropped in the next section). But what 
is the value of P(Aı)? We do not know. However, we do know that Aj is 
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probabilistically justified by Az, and so we can calculate P(A,) in terms of 
P(A2), and so on: 


P(A1) = P +(@—B)P(A2) 
P(A2) = P + (@—B)P(A3) 
P(A3) = P + (@—B)P(A4). 


We can now see how to get rid of the unknown unconditional probabilities, 
namely by nesting the formulas. Thus we can remove P(A,) by substituting 
its value into (3.12), so that we obtain: 


P(q) =B+(a—B)P(A1) 
= B +(a—B)[B + (a—-B)P(A2)| 
= ßB+ß(a-B)+(a-PB)’P(A;). (3.13) 


Next, by inserting the value of P(A2) into (3.13) we attain 


P(q) =B+B(a—B)+(a—B)’[B +(a-B)P(A3)| 
= B+B(a—-B)+B(a-B)*+(a—B)°P(A3), (3.14) 


by which we got rid of P(A2). And so on. After a finite number m of steps 
we obtain the following formula: 


P(q) =B +B(a—B) +B (a—B)+...+B(a—B)" +(a By" PlAnsı). 

(3.15) 
Eq.(3.15) is the beginning of the “regress of probability-values” that Lewis 
is talking about. His argument is that, if this series is continued ad infinitum, 
P(q) will always tend to zero, notwithstanding the fact that Reichenbach’s 
correction has been taken into account. This is presumably why Lewis com- 
ments: “I disbelieve that it [the addition of the second term] will save his 
point.” Let us see whether Lewis’s disbelief is justified. 

There are two things that should be noted about (3.15). The first is that 
it contains only one factor of which the value is unknown. This is P(Am+1), 
i.e. the probability of the first proposition, Am+1, in this finite series. Since 
all the probabilities in the series are ultimately computed on the basis of this 
unconditional probability, it seems that we must know its value in order to be 
able to calculate P(q). The second thing is that, as m gets bigger and bigger, 
so that the justificatory chain becomes longer and longer, (œ — B)’"*! gets 
smaller and smaller without limit, finally converging to zero. But of course, 
if (a — B)"+! converges to zero, then (œ — B)"*t!P(A,41) dwindles away 


— 
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to nothing too, for P(A,,+1) cannot be greater than 1. The right-hand side of 
Eq.(3.15) is a sum, and if a term in a sum goes to zero, it does not contribute 
in the limit. With an infinite number of steps, the terms that remain are 


P(q) =B+B(a—B)+B(a—-B)’+... 
= Bl1+(@-B)+(a@-P)?+...] 


- BY (a—B)". 3.16) 
n=0 


Since œ — ß is less than one, the sum here is a convergent geometric series 
which we can evaluate: B 


Pa) =i arp 
In general, (3.17) does not yield zero. For example, if œ is 3/4 and B is 3/8, 
then P(q) is 3/5.” 

We conclude that Lewis is mistaken. It is not the case that a “regress of 
probability values” always yields zero. We have just seen an example of such 
a series, consisting in asum with an infinite number of terms, that yields a 
number other than zero. Since Lewis’s statement is invalid, it cannot support 
his main claim that probability statements only make sense if they presup- 
pose certainties.”® 


(3.17) 


3.5 A Nonuniform Probabilistic Regress 


The counterexample in the previous section is a very special case. For in 
demonstrating that a probabilistic regress makes sense, we have assumed 


27 Eq.(3.17) gives in fact the fixed point of a Markov process. The stochastic matrix 
governing the process is regular, and the iteration is guaranteed by Markov theory to 
converge to the solution of the fixed point, px = B + (a@— B) p.. However, this quick 
route to (3.17) only works when the conditional probabilities are the same from step 
to step: in the general case that we consider in the next section Markov theory does 
not help, which is why we have not used it here. We shall discuss fixed points more 
fully in Sections 8.4 and Appendix D. 

28 This example shows that James Van Cleve’s defence of Lewis, and thereby his 
attack on Reichenbach, is mistaken (Van Cleve 1977). Van Cleve argues that an in- 
finite iteration of the rule of total probability must be vicious, because “we must 
complete it before we can determine any probability at all” (ibid., 328). But our 
counterexample to Lewis demonstrates that an infinite iteration may well be com- 
pletable, in the sense that it is convergent and can be summed explicitly, yielding a 
definite value for P(q). 
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that the conditional probabilities are uniform, i.e. that they remain the same 
throughout the entire justificatory chain. Such an assumption is of course 
rarely fulfilled. It is very uncommon that the degree to which proposition q 
is probabilistically supported by A; is the same as the degree to which A is 
probabilistically supported by A2, and so on. 

However, it is possible to construct counterexamples without making the 
assumption that the conditional probabilities are uniform. The rule of total 
probability relating An to An+1 is 


P(An) = P(An |[An+1 \P(An+1) + P(An|An+ı)P(An+1) ’ 


or, with the abbreviation of the conditional probabilities as œ and ß, as in the 
previous section: 


P(An) = @P(An+ı) + BP(An+1)- 


In the nonuniform case the conditional probabilities differ from one link to 
another, so we have to add an index n to & and B: 


P(An) = OmP(An+1) + BaP(-Ant1) 
= Pn + MmP(Antı) (3.18) 


where œn, B, and y, are defined as follows: 


Qn = P(AnlAn+1) 
Bn = P(An|7An+1) 
Yn = On — Bn . (3.19) 


Imagine a finite probabilistic chain Ap,A1,...,Am+1, where again Ag is prob- 
abilistically supported by A|, which is probabilistically supported by A2, and 
so on. For notational convenience we have temporarily used Ao for the tar- 
get proposition q and A,,,; for the grounding proposition p. It is possible to 
concatenate all the instances of the rule of total probability to yield, for any 
m > 0, 


P(Ao) = Bo + Bi + B2 +- - -+W - - - Ym-ıBm + WN - - - YnP(Am+1)- 
(3.20) 


Formula (3.20), of which a proof is given in Appendix A.1, is the nonuniform 
counterpart of formula (3.15) in the uniform case. 
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We have seen that, notwithstanding Lewis’s opinion, the extension of the 
finite (3.15) to an infinite chain can be envisaged: in the uniform case the 
infinite extension is well-defined if the extreme values O and 1 for the con- 
ditional probabilities are excluded. Does it make sense to extend (3.20) to 
an infinite number of links? Can a probabilistic regress in the nonuniform 
case also be well-defined and moreover yield a nonzero value for the tar- 
get? Again, one example is enough to refute Lewis’s argument in this more 
general setting, and here it is: 


1 1 1 


OO = 1— ; = —; 
n E EE Pn n+3 


1 
= 1= ——. 8.21 
Yn TE (3.21) 
In (3.21) &, and B, depend nontrivially on n. The resulting infinite series is 
not a geometric series, as it was in the uniform case that was introduced in 
Section 3.4. Nevertheless, as is shown in Appendix A.5, when we insert the 
formulae (3.21) into (3.20) we can work out the sum explicitly, obtaining 


— 3 2m+5 1 
P(Ao) 4 IDOTI) + ma P (Am4) ; (3.22) 


In the limit that m goes to infinity, the second and the third terms on the 


right-hand side of (3.22), namely To T and aa P(Am+ı), both go to 
3 


zero. Thus only the term 7 survives in the limit, so that P(Ao), that is the 
probability of the target, P(g), equals 3: Here then is anew and more general 
case that invalidates Lewis’s argument that an infinite probabilistic regress 
must yield zero. 


3.6 Usual and Exceptional Classes 


The above examples not only illustrate that Lewis was mistaken, but also that 
a probabilistic regress can have a limit and in that sense be benign. But what 
are the conditions under which this is so? When exactly does a probabilistic 
regress yield a well-defined value for the target proposition? 

In general there exist two conditions. Each of them is necessary, and to- 
gether they are sufficient. Look again at our finite nonuniform chain, (3.20): 


P(Ao) = Bo + wBı + WN B2 +- - -+W - - - Yn-ıBm + WN «++ YmP(Am+ı)- 


The right-hand side of this equation consists of two parts, namely the sum of 
conditional probabilities, 
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Bo + Bit wN B2 +- - + WYı -- - Ym-ıBm ; 


and the remainder term, 


YYı--- YnP(Am+1) ‘ 


The first condition for a benign probabilistic regress is that the series of con- 
ditional probabilities converges in the limit. The second condition is that, as 
m is taken to infinity, the remainder term goes to zero. 

As we prove in Appendix A.3, the first condition is always satisfied, 
given that we assume probabilistic support, i.e. the constraint P(A„|A„+1) > 
P(An|7An+1) for all n. No matter whether we are dealing with uniform or 
with nonuniform conditional probabilities, the infinite series 


Bo + Pi + nB + WNYBI+..., (3.23) 


always converges. However, the matter is different as far as the second con- 
dition is concerned. This condition is satisfied in the uniform situation (with 
the restriction that @ is not equal to one and ß is not equal to zero), but it 
is not always satisfied in the nonuniform situation. We shall call the class of 
cases where both conditions are fulfilled the usual class.”? In the usual class 
the probability of the target is equal to the following convergent series of 
terms, each of which is a function of the conditional probabilities only: 


P(q) = Bo + M81 + w B2 + PnP +... (3.24) 


The class of cases in which only the first requirement is fulfilled we will 
call the exceptional class. Regresses in the exceptional class do not furnish 
counterexamples to Lewis’s conclusion; but those in the usual class, on the 
other hand, do so, on condition that at least one of the ß, is nonzero. 

When does a nonuniform probabilistic regress fall within the exceptional 
class? For our purpose this question is of course important, since it creates 
the watershed between probabilistic regresses which are benign (in the sense 
that they yield an exact and well-defined value for the target) and those that 
are not (in the sense that they only yield such a number if they have a first 


29 In the usual class the infinite series (3.23) converges even if one relaxes the con- 
dition of probabilistic support. However, since we are interested in justification, of 
which probabilistic support is a necessary condition, this extension of the domain 
of convergence is not required for our purposes. Moreover, the condition of proba- 
bilistic support is needed for our conception of epistemic justification as a trade-off 
(see Chapter 5) as well as for convergence in the probabilistic networks discussed 
in Chapter 8. 
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member, a ground). Clearly the answer to this question depends on whether 
the remainder term vanishes in the limit. We have seen that this will be the 
case if the factor yyy, ...%n vanishes as m goes to infinity. For then the re- 
mainder term YoY; - - - YinP(Am+1) will die out, since P(Am+1), the probability 
of the grounding proposition, cannot be greater than one. 

But when exactly does %71 ..- Yn go to zero? That is the key question. As 
we show in Appendix A.4, the answer depends entirely on the asymptotic 
behaviours of œn and B,. The factor WYı ...Ym goes to zero if and only if 
a, does not tend to one more quickly than 1/n tends to zero, or if B, does 
not tend to zero more quickly than 1/n tends to zero. If at least one of these 
disjuncts applies, then the nonuniform probabilistic regress falls within the 
usual class. It then yields a unique probability value for the target proposi- 
tion, Ag or g, which does not depend on an inaccessible unconditional prob- 
ability at infinity. That is, it does not depend on the value of P(Am+1) — 
or P(p) — in Eq.(3.20) in the limit that m goes to infinity.°° A nonuniform 
probabilistic regress within this usual class constitutes a counterexample to 
Lewis’s argument. A specific instance is provided by the example (3.21), for 
this lies in the usual class, since the remainder term in (3.22), a P(Am+1 ), 
goes to zero as m goes to infinity. In this limit the right-hand side of (3.22) 
tends to 3: 

If, however, Œn goes to one very quickly and ß, goes to zero very quickly 
as n tends to infinity, more quickly in fact than 1/n tends to zero, then the 
nonuniform probabilistic regress belongs to the exceptional class. In this case 
the regress does not result in a unique, well-defined probability value for the 
target proposition, since the unknown probability of the ground still plays a 
significant role. The regress is now vicious in the sense that the probability 
of the target depends in part on the inaccessible ground, and it would not 
form a counterexample to Lewis’s foundationalist argument. 

An example of a regress in the exceptional class is as follows, 


1 1 
Bn = 


m4243) Mall > 


so that 
1 


(n+2)?(n+3) 
Here 1 — a, and ß, both tend to zero as n tends to infinity more quickly 
than 1 tends to zero, which shows that the example is indeed a member of 


On = Pnt h =1 


30 That the resulting system is consistent, in the sense that there exists at least one 
assignment of probabilities for all possible conjunctions of the propositions A, has 
been demonstrated by Frederik Herzberg (Herzberg 2013). 
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the exceptional class. In Appendix A.6 we work out the expression for the 
probability of the target proposition, obtaining 


P(Ao) = $- ran). 620 


In this case the remainder term, 5 "> P(Am+1), does not vanish in the limit. It 


becomes formally one half times the limit of P(Am+1) as m tends to infinity, 
which is ill-defined. 

A probabilistic regress in the exceptional class is characterized by the fact 
that it is actually very close to a regress of entailments, i.e. to the ‘classical’ 
regress, in which A, entails A, for all n. It is therefore to be expected 
that a straightforward classical regress will also fail to provide us with a 
counterexample to Lewis’s claim, and this is indeed the case. Here is how a 
classical regress looks in our probabilistic formalism. If A,+ entails A, for 
all n, then 


Qn = P(A,„|An+ı) = 


and it is shown in Appendix A.7 that (3.20) reduces in this case to 


P(=Ao) = W7 - - Im P(Antı) , (3.27) 


for any m. We have to consider various possibilities for the behaviour of 
Bn = P(An|7An+1) 


as n tends to infinity. If Pa, were to tend to zero no more quickly than 1/n 
does, the product YoY ... Yn in (3.27) would tend to zero as m tends to infin- 
ity, so P(A) = 0, irrespective of the behaviour of P(~Am+1). Moreover it 
follows also that P(~A,) = 0 for all n, which means that ß, is not defined. 
This is inconsistent, so we conclude that after all B,, must tend to zero more 
quickly than 1/n. But then the product yoy ... Yn tends to some non-zero 
limit, and so P(-Ao) is not uniquely determined, since P(=Am+1) can be as- 
signed no particular limit as m goes to infinity. The regress of entailments, or 
implications, is thus necessarily in the exceptional class. 
A very special case is when 


Bn = P(A,| An+1) =0 (3.28) 


for all n. We have then P(=Ao) = P(=A,) for all n, so all the proba- 
bilities, P(A„), have the same, undetermined value. Eq.(3.28) implies that 
P(7A;,|7An+1) = 1, which is to say that =A,,,; entails =A, which of course 
means that A, entails A„+ı (up to measure zero). If &, = 1 and n = 0, then 
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An implies, and is implied by A„+ı: there is a regress of bi-implication all 
the way along the chain. All the probabilities are the same, but the value is 
undetermined by the regress. Such a regress of bi-implication is vicious in 
our sense, for here the truth value of the target cannot be determined in the 
absence of the truth value of the first member. 

To summarize, the system of conditional probabilities belongs to the usual 
class if and only if 1 — a, or Bn do not tend to zero more quickly than 1/n 
tends to zero. On the other hand, if 1 — a, and Bn both tend to zero more 
quickly than 1/n, then the system belongs to the exceptional class, and the 
unconditional probabilities of the propositions are not determined. The sit- 
uation in which @, is nearly one, and ß, is nearly zero, is close to the case 
of bi-implication. We therefore might call the exceptional class the case of 
quasi-bi-implication. 


3.7 Barbara Bacterium 


In this chapter we have introduced the concept of a probabilistic regress, that 
is an epistemic chain of the form 


q < A] 4 A2 4 A3 4 Ay... 


where the arrow is interpreted in terms of probabilistic support. We examined 
Lewis’s view that such a regress is absurd, since it allegedly implies that 
the probability of q is zero. According to Lewis, the only way to avoid the 
absurdity was to stop at a proposition, p, which is certain: 


q <— A, <— Aa +— Az +— Ag... — Pp. 


We have opposed Lewis’s argument by giving counterexamples, i.e. prob- 
abilistic regresses which yield a unique, nonzero probability value for the 
target. Some of these regresses were based on uniform conditional probabil- 
ities, others on nonuniform ones. 

All our counterexamples were abstract. This is somewhat unfortunate, 
since a familiar objection to infinite regresses is that they are not concrete 
and lack practical relevance. The objection becomes even more pressing if 
one distinguishes (as we did not do here but will do in later chapters) be- 
tween propositions and beliefs. Propositions are abstract entities, but beliefs 
are propositional attitudes that people really have. Whereas the idea of an 
infinite propositional regress might sound not unreasonable, an infinite dox- 
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astic regress seems a contradiction in terms. Where could we ever find a 
doxastic series of infinite length? 

In the next chapters we will come back to this objection, and then we will 
also discuss the distinction between a propositional and a doxastic regress. 
At this juncture we will restrict ourselves to showing that a probabilistic 
regress of propositions also is relevant to a real-life situation. 

Imagine that we are trying to develop a new medicine to cure a disease. 
In this connection, we want to know whether a particular bacterium has a 
certain trait, 7. Bacteria reproduce asexually, so one parent, the ‘mother’ 
bacterium, alone produces offspring. After having carried out many experi- 
ments, one day we take from a batch a particular bacterium, which we call 
Barbara. From our experiments we know that the probability that Barbara 
has T is considerably greater if her mother has 7 than if her mother lacks it. 
So if q is ‘Barbara has T’ and A, is ‘Barbara’s mother has T’, then we can 
say that A; probabilistically supports q. It is not certain that Barbara has T if 
her mother has the trait, but on the other hand Barbara could have T even if 
her mother does not have it. Thus 1 > P(q|A1) > P(g|7A1) > 0. 

The unconditional probability of Barbara having T is given by 


P(q) = P(q|A1)P(A1) + P(qlAı)P(Aı). 


Whereas the conditional probabilities in this equation, P(g|Aı) and P(g|=Aı), 
may be assumed to have been determined from our experiments, obtaining 
P(Aı) is a problem. What is the probability that Barbara’s mother has T? We 
know that it is given by 


P(Aı) = P(A1 |A2)P(A2) + P(Aı |=A2)P(=A2), 


where P(A2) is the probability that Barbara’s grandmother has 7, which 
in turn is conditioned by P(A3), the probability that Barbara’s great-grand- 
mother has T.?! 

It will be clear that we can only compute P(q) if we know P(A3). And the 
situation remains the same, even if we add more and more instances of the 
rule of total probability, going further and further back in Barbara’s ancestry. 
It seems we are only able to compute the probability that Barbara has T if 
we know what is the unconditional probability that her primordial mother 
had T. So at first sight it looks as though foundationalists are right: if q is 
probabilistically justified by Aı, which is probabilistically justified by A2, 


31 In the reading of Pastin the probability intended by Lewis would be P(q ^A1), 
see footnote 20. But this is neither the probability of interest nor does it fit what is 
at stake in the debate between Lewis and Reichenbach. 
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et cetera, then we have to know for sure the probability of the grounding 
proposition in order to be able to calculate the probability of q. 

This impression, intuitive as it may seem, is however incorrect, and we 
have already seen why. The chain q +— A; +— A2 +— A; leads to: 


P(q)=B+B(a—B)+B(a—B)* +(a—B)’P(As), 


see (3.14). Going infinitely far back into Barbara’s ancestry, we obtain (3.16): 


P(q)=B+B(a—B)+B(a—B)’+.... 


This does not have a grounding proposition p. A primordial mother of Bar- 
bara makes no contribution, yet we are able to calculate the probability that 
Barbara herself has T, and this probability, notwithstanding Lewis’s opinion, 
is not zero. 

Let A, be the proposition: ‘Barbara’s ancestor in generation n has T’. Let 
the probability that a bacterium has T if her mother has T be 0.99, and the 
probability that a bacterium has T if her mother lacks it be 0.02. So œ = 
P(An|An+1) = 0.99, B = P(An|An+ı) = 0.02, and hence y= a — p = 0.97. 
Now (3.16) becomes: 


B B 


PoS ea 


in agreement with (3.17). With the numbers chosen for & and ß, we can now 
calculate the probability that Barbara has T: it is 3. 

The foregoing example made use of uniform conditional probabilities. As 
an example of a nonuniform probabilistic regress, suppose that an effect of 
the increasing pollution of the nutrient, as a result of the growing mass of 
bacteria in it, is that the probability of a bacterium having T increases as 
time goes on, quite independently of whether the mother bacterium has T. 
For example, if @ = P(An|An41) =a+b"*! and B, = P(An|Anrı) = b”, 
where a and b are positive numbers such that a+b < 1, then a, and B, are 
different from generation to generation, although % = a is constant. Note 
that, since b is less than one, the factor b”*! increases as n decreases, so 
in Barbara’s remote ancestry there was little pollution, but it increases from 
generation to generation until Barbara herself appears on the scene. Eq.(3.20) 
once more reduces to a finite geometric series that can be summed: 


P(q) =b|1+ab+ (GD)? +... (ab) | +a™! P(Ami1) 
_ (ab)? 


=b 
1-ab 


+a” P(Ams1) . 
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In the case of an infinite number of generations, since (ab)"'*! and a+! both 
vanish in the limit of infinite m, we find 


b 
1-ab‘ 


Pig)= (3.29) 
For example, if a = 1 and b = 2, we find from (3.29) that P(g) = 3. 

One might object that our argument so far is still not very realistic, to put 
it mildly. For a start, the assumption that conditional probabilities are known 
as precise numbers is a travesty of what is attainable in scientific practice. 
In real experiments the conditional probabilities are imprecise, merely being 
known to lie within some specified interval, and as a result, the unconditional 
probability of the target, too, is subject to measurement error. 

Fortunately, when the conditional probabilities are uniform, as for exam- 
ple in the case of Barbara, then it is relatively easy to determine the interval 
within which the target probability must lie. For suppose that P(A„|A„+1) is 
in the interval [Qn, Ov], and P(A,|7An+1) is in the interval |Bm, Bu]. It can 
be shown that expression (3.17) for P(q) is an increasing function of both a 
and of B;*” and this means that the uncertainty in P(g) is given by 


Bm Bu 


— < P(q) < —, 
1—Qn+Bmn (a) 1— om + Bu 


on condition that ay — Bm < 1. 

In the more general case where the conditional probabilities are not uni- 
form, the calculation of the uncertainty in the value of P(q) is a little more 
intricate. However, since the condition of probabilistic support is in force, all 
the terms in Eq.(3.23) are positive, and it can be done without too much ef- 
fort. One has to minimize and maximize each term, within the experimental 
error bounds, in order to obtain lower and upper bounds on P(q). 

Even so, one might still feel the urge to protest that we are not dealing with 
real life situations. No bacterium has an infinite number of ancestor bacteria, 
if only because of the fact of evolution from more primitive algal slime, 


32 The partial derivatives of idea with respect to & and ß are both positive: 
d 
B = B >0 
oal-a+ß (1-a+BPß) 
d 1— 
a 7 —>0 
oßl-a+ß (1-a+BPß) 
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which had grown out of earlier life forms, which sprang from inanimate 
matter, which originated in a supernova explosion, and so on. 

This is of course true, and it makes short shrift of any remaining thought 
about a beginning in the form of a first bacterium.”” For our approach, how- 
ever, the issue is moot. The reason is that the further away a node in the 
chain is from the target, the smaller its influence on the target becomes. Ap- 
plied to Barbara: long before we reach the stage where her ancestor bacteria 
evolve from more primeval life forms, they have become totally irrelevant 
to the question whether Barbara has T. This phenomenon we call ‘fading 
foundations’, and it is explained in the next chapter. 


33 Sanford 1975, 1984; Rescher 2010, 56. 
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Chapter 4 


Fading Foundations and the Emergence of 
Justification 


Abstract 

A probabilistic regress, if benign, is characterized by the feature of fading 
foundations: the effect of the foundational term in a finite chain diminishes 
as the chain becomes longer, and completely dies away in the limit. This 
feature implies that in an infinite chain the justification of the target arises 
exclusively from the joint intermediate links; a foundation or ground is not 
needed. The phenomenon of fading foundations sheds light on the difference 
between propositional and doxastic justification, and it helps us settle the 
question whether justification is transmitted from one link in the chain to 
another, as foundationalists claim, or whether it emerges from a chain or 
network as a whole, as is maintained by coherentists and infinitists. 


4.1 Fading Foundations 


In the previous chapter we have introduced the idea of a probabilistic regress, 
and we have seen that such regresses are in general unproblematic: they 
mostly have a calculable limit, thus providing the target proposition, g, with 
a unique probability value. In all but a few exceptional cases there is no con- 
ceptual problem in saying that q is probabilistically supported by an epis- 
temic chain of infinite length. 

An important part of our argument concerned the röle of the foundational 
or grounding proposition, p. In calculating the unconditional probability of 
the target, g, we managed to eliminate all the unconditional probabilities — 
except that of p. The factor P(p) remained the only term in the chain of 
which the value was unknown. Consider the finite chain 
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q<— A, +— Ao 4 ... — Am_1 $— Am — P, 


where q is probabilistically supported by A1, which is probabilistically sup- 
ported by A2, ..., and so on, until Am, which is probabilistically supported 
by the grounding proposition or belief p. 

In any finite chain, we need to know the value of value of P(p) in order to 
calculate P(g). However, the importance of the unknown P(p) for the prob- 
ability of the target, P(q), lessens as m gets bigger. If the chain is very short, 
consisting only of two propositions, g and p, then the importance of P(p) for 
P(q) is at its height: all the support for q comes from p (together with the 
pair of conditional probabilities that connect the one to the other). But now 
imagine that the chain is a little bit longer, consisting of three propositions: 


q<— A, + p: 
In terms of nested rules of total probability this becomes: 


P(q) = P(ql=Aı) + [P(qlAı) — P(q|=Aı){P(Aı|=p) 
+[P(Aı|p) — P(Aı |=p)]P(p)}- (4.1) 


In (4.1) the importance of P(p) has somewhat decreased. It is still the case 
that it largely determines P(q), but the influence of the conditional proba- 
bilities has become greater. In general it is so that, as the chain becomes 
longer, the support provided by the totality of the conditional probabilities 
increases, while that given by the foundation decreases. In other words, as m 
in Am grows larger and larger, a law of diminishing returns come into force: 
the influence of P(p) on P(g) tapers off with each link, until it finally fades 
away completely. In the limit that m tends to infinity, all the probabilistic sup- 
port for g comes from the conditional probabilities together, and none from 
the ground or foundation. This characteristic, that is essential to a probabilis- 
tic regress as we defined it, we call the feature of fading foundations. As we 
add more and more links to the chain the influence of P(p) tails off, and P(q) 
draws closer and closer to its final value. 

The feature of fading foundations can be illustrated by our story about 
Barbara bacterium in the previous chapter. Recall that q is the proposition 
‘Barbara has trait T’, A, is ‘Barbara’s ancestor in the nth generation has T’, 
and p is “Barbara’s primordial mother has T’. Now imagine that long and 
extensive empirical research in our laboratory has taught us that the proba- 
bility that a bacterium has T is 0.99 when her mother has T, and that it is 
0.04 when her mother lacks T: 
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P(q\A1) = P(A1 A) =... = P(An-ılAm) = P(Am|p) = 0.99 
P(a|=Aı) = P(A1|=A2)... = P(An-ıl=Am) = P(Am|-p) = 0.04 


Let us further take for the unconditional probability of p the value 0.7. With 
the numbers we have chosen for the conditional probabilities, 0.99 and 0.04, 
the computed values for the unconditional probability of q are listed in the 
following table: 


Table 4.1 Probability of q when the probability of p is 0.7 
Number of A, 1 2 5 10 25 50 75 100 œ 


Probability of q .710 .714 .726 .743 .774 .793 .798 .799 .8 


The first entry in this table refers to the chain q <— A; <— p, where there is 
only one A. With the values that we have chosen in our example, the probabil- 
ity of the target proposition g yielded by this chain is 0.709. The second entry 
corresponds to the chain q <— A, <— Aa <— p. Here there are two A’s, so 
the probabilistic support for g has grown, resulting in a probability for q that 
is somewhat higher, namely 0.714. The third entry refers to a chain of seven 
propositions: the target proposition g, five A’s and the grounding proposi- 
tion p. The support is still further augmented, and the probability of q equals 
0.726. By including more and more A’s we observe that the probabilistic 
support for g grows. The final entry corresponds to the situation where the 
chain is infinitely long. Here the probabilistic support for g has reached its 
maximum, culminating in the unconditional probability P(q) = 0.8. The lat- 
ter can considered to be the ‘real’ value for the probability of q relative to the 
numbers chosen for the conditional probabilities. ! 

But now look at the second table, 4.2, where the conditional probabilities 
are the same as in Table 4.1, but where the unconditional probability of p 
is 0.95. There are two things that should be noted about these two tables. 
Firstly, the probability of g in Table 4.2 culminates in a limiting value that 
is the same as that in Table 4.1, namely 0.8. Secondly, while the numbers in 


! In this table as well as in the following one, the values of the conditional probabil- 
ities are uniform, remaining the same throughout the chain. As has been explained 
in the previous chapter, and more in detail in the appendices, this is however not 
essential to the phenomenon of fading foundations. The argument goes through, in 
the usual class, when the values of the conditional probabilities differ from link to 
link. 
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Table 4.1 steadily increase as the number of links becomes larger, those in 
Table 4.2 go down. How can we understand these facts? 


Table 4.2 Probability of q when the probability of p is 0.95 
Number of A, 1 2 5 10 25 50 75 100 œ 
Probability of q .935 .929 .910 .885 .840 .811 .803 .801 .8 


The answer is provided by the feature of fading foundations. As the chain 
lengthens, the role of the foundation p becomes less and less important until 
it dies out completely. At the end of the day, the probability of q is fully 
determined by the conditional probabilities; everything comes from them 
and the influence of the foundation p has completely disappeared from the 
picture. The reason why the numbers in Table 4.1 go up, while those in Table 
4.2 go down, is because in the first case the probability p is lower than the 
final real value of P(q), relative to the chosen conditional probabulities, while 
in the second case it is higher. This is exactly what is to be expected as the 
foundational influence gradually peters out. 

Lewis and Russell were right that, in a probabilistic regress, something 
goes to zero if m goes to infinity. However, this ‘something’ is not the value 
of P(q), as they thought. Rather it is the influence that the foundation p has 
on the target q. This is not to say that p itself has become highly improbable, 
for p may have any probability value at all. It is rather that, in the limit, the 
effect of the would-be foundation p has faded away completely: the support 
it gives to q is nil.” 


4.2 Propositions versus Beliefs 


Up to this point we have not distinguished between propositional and dox- 
astic justification: g, the A’s, and p could be either propositions or beliefs. 


? The fading influence of the foundation p should not be confused with the famil- 
iar washing out of the prior in Bayesian reasoning. In Bayesian updating, the prior 
probability becomes less and less important under the influence of new pieces of in- 
formation coming in, until it washes out completely. Although this looks rather like 
the phenomenon of fading foundations, where the influence of p similarly dimin- 
ishes, the two phenomena are actually quite different, as we explain in Appendix 
C. 
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However, it has often been pointed out that the distinction is relevant when 
we talk about justification, especially if we discuss the possibility of infinite 
justificatory chains. In this section we will look at a debate between Michael 
Bergmann and Peter Klein in order to explain how the phenomenon of fading 
foundations can shed light on the subject.” 

Bergmann has critized Peter Klein’s infinitism by arguing that, although 
propositional justification might go on and on, doxastic justification must al- 
ways come to a stop; infinite epistemic chains and doxastic justification sim- 
ply seem incompatible.* In a reply to Bergmann, Klein has acknowledged 
that, unlike propositional justification, doxastic justification is always finite. 
As he wryly notes, “We get tired. We have to eat. We have satisfied the en- 
quirers. We die”.” He does not regard this as a difficulty for infinitism, how- 
ever, since the stop is merely contextual or pragmatic. According to Klein, 
“doxastic justification is parasitic on propositional justification’: in principle 
it can go on, but in practice it ends.® 

Bergmann, however, believes that Klein’s position is untenable, arguing 
as follows.’ In order to reject foundationalism, Klein must endorse the fol- 
lowing view: 


Kı: For a belief B; to be doxastically justified, it must be based on some 
other belief B;. 


Bergmann then introduces 


3 See Peijnenburg and Atkinson 2014b. We will say a bit more about the distinction 
between propositional and doxastic justification in the next chapter, when we dis- 
cuss Klein’s reply to the notorious finite mind objection. For the difference between 
propositional and doxastic justification, see also Turri 2010. 

4 Bergmann 2007. Jonathan Kvanvig has argued that Klein’s infinitism has difficul- 
ties not only accounting for doxastic justification, but for propositional justification 
too (Kvanvig 2014). We will briefy come back to Kvanvig’s criticism in the next 
chapter. 

> Klein 2007a, 16. See Poston 2012, which contains a proposal for emerging justifi- 
cation on the basis of Jonathan Kvanvig’s INUS conditions. 

6 Ibid., 8. Michael Williams (Williams 2014, 234-235) has noted that the distinction 
between doxastic and propositional justification was introduced by Roderick Firth 
(Firth 1978). He recalls that Firth, too, claims that doxastic justification is parasitic 
on propositional justification, but argues that Firth attaches a completely different 
meaning to this claim than does Klein. As Williams sees it, Klein tries to combine 
an infinitist conception of propositional justification with a contextual conception of 
doxastic justification — a venture that, according to Williams, is doomed to failure 
(Williams 2014, 236-238). 

7 Bergmann 2007, 22-23. 


88 4 Fading Foundations and the Emergence of Justification 


K2: A belief B; can be doxastically justified by being based on some other 
belief B; only if B; is itself doxastically justified. 


and subsequently tries to catch Klein on the horns of a dilemma. Klein must 
either accept or reject K2. If he rejects it, then he must maintain that a be- 
lief B; can be doxastically justified by another belief B; even if the latter is 
itself unjustified. This would turn Klein into a defender of what Bergmann 
calls the unjustified foundations view — an outlook that is not particularly 
Kleinian, to say the least. On the other hand, if Klein accepts Kz along with 
Kı, then he would run the risk of becoming a sceptic. For then “he is commit- 
ted to requiring for doxastic justification an infinite number of actual beliefs. 
...But it seems completely clear that none of us has an infinite number of 
actual beliefs”.® 

The phenomenon of fading foundations points to an escape route out of 
this dilemma, for it shows that there is another way to reject K2. If doxastic 
justification indeed draws on propositional justification, as Klein claims, then 
the justification that one belief gives to another also diminishes as the dis- 
tance between them increases. That is to say, a belief Bı can be doxastically 
justified by a chain of other beliefs, B2, B3, to Bn, such that: 


1. each Bm is conditionally justified by Bm+1, where 2 <m<n—1; 

2. B„ may be justified by another belief, or may justify itself, or may be 
unjustified; 

3. the effect of B, on Bı becomes smaller as n becomes bigger and bigger. 


In the limit that n goes to infinity, the justificatory support given by B, to By 
vanishes completely. In that case it does not matter for the doxastic justifica- 
tion of Bı whether B, is justified or not: B; can still be doxastically justified. 
Klein and Bergmann are of course right that we cannot forever go on justi- 
fying our beliefs. But the phenomenon of fading foundations manifests itself 
already in chains of finite length. Often we need only a few links to observe 
that the influence of the foundational belief on the target belief has dimin- 
ished considerably. Of course, we can only be sure of what we seem to be 
observing in a finite chain if there exists a convergence proof for the corre- 
sponding infinite series, and a proof that the remainder term goes to zero: 
there needs to be knowledge of what happens in the infinite case in order for 
us to be certain that what we see in the finite case is a robust phenomenon 
rather than a mere fluctuation. But as we have seen such a proof can be 
provided. Klein, too, argues that “rejecting Kz does not entail endorsing an 


8 Bergmann 2007, 23. See also Bergmann 2014. 
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unjustified foundationalist view” (Klein 2007b, 28). His argument is differ- 
ent from ours, in that it refers, among other things, to a reason’s availability. 
We however believe that our reasoning about fading foundations can capture 
Klein’s most important intuitions, and we will come back to availability in 
the next chapter. 

Let us sum up. In doxastic justification the choice is not between indefi- 
nitely going on and the unjustified foundations view. There is a third possi- 
bility, provided by what we know about infinite chains. Once we have rec- 
ognized that any justification that B, gives to Bı diminishes as the distance 
between the two is augmented, we might decide to stop at B, because the jus- 
tificatory contribution that any further belief would bestow on B4 is deemed 
to be too small to be of interest. When exactly a justificatory contribution 
is considered to be negligible depends on pragmatic considerations, but our 
two tables show that we are able to make these considerations as precise as 
we wish. 

This third possibility goes unnoticed in the debate between Bergmann 
and Klein. Because the fact of fading foundations has not been taken into 
account, they fail to realize that the expression ‘stopping at a belief B,’ can 
have more meanings than those that have been envisioned in the literature. 
It need not mean ‘making an arbitrary move’, as some coherentists have 
claimed. Nor need it imply that B, is taken to be unjustified or self-justified. 
Rather, an agent can decide to stop at a belief B, because she realizes that, 
for her purposes, B„+1 has become irrelevant for the justification of B,. She 
finds that the degree of justification conferred upon B by her beliefs B2 to Bn 
is accurate enough, and she feels no call to make it more accurate by taking 
B,„+1 into account. For her, the justificatory contribution that B,„+1 gives to 
B, has become negligible, and with our tables she can precisely identify a 
point at which the role of B, is small enough to be neglected, where we 
use the word ‘justificatory’ as before as meaning probabilistic support plus 
something else. 

In this way we have given a more precise meaning to contextualist con- 
siderations that have been often expressed. For example Klein: 


The infinitist will take the belief that q to be doxastically justified for S just 
in case S has engaged in providing ‘enough’ reasons along the path of end- 
less reasons. ... How far forward ...S need go seems to me a matter of the 
pragmatic features of the epistemic context.? 


9 Klein 2007a, 10. 
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We don’t have to traverse infinitely many steps on the endless path of reasons. 
There just must be such a path and we have to traverse as many as contextually 
required. !? 


And Nicholas Rescher: 


In any given context of deliberation the regress of reasons ultimately runs 
out into ‘perfectly clear’ considerations which are (contextually) so plain that 
there just is no point in going further. ...Enough is enough.!! 


Our method differs however from what Klein and Rescher seem to have in 
mind. As we will explain in more detail in 5.3, where we argue for a view of 
justification as a kind of trade-off, the level of accuracy of the target can be 
decided upon in advance. Whether this level will be reached after we have 
arrived at proposition number three, four, sixteen, or more, depends on the 
structure of the series and on the chosen level. In no way does it depend on 
the question of how obvious proposition number three, four, sixteen, etc. is. 
Even if the proposition at issue is very obvious, and thus has a high probabil- 
ity, its contribution to the justification of the target might be small enough to 
be neglected. This is different from the contextualism of Klein and Rescher, 
according to which an agent stops when the next belief in the chain is suffi- 
ciently obvious and itself not in need of justification. 


4.3 Emergence of Justification 


It has been said that foundationalists and anti-foundationalists (that is co- 
herentists and infinitists) conceive justification differently: the former grav- 
itate towards an atomistic concept of justification, whereas the latter see it 
as a holistic notion.!? Consequently, foundationalists regard justification as 
a property that can be transmitted or transferred from one proposition to an- 
other. The idea here is that justification somehow arises as a quality attached 
to a particular proposition, notably to the ground p, and then via inference 
is conveyed to the neighbouring proposition. The inferences themselves in 
no way affect the property that they transfer. They are just conduits, as Mc- 
Grew and McGrew would have it, completely neutral in character, like wifi 
connecting two computers. 13 


10 Thid., 13. 

Il Rescher 2010, 47. 

12 Sosa 1980; Bonjour 1985; Dancy 1985. 
13 McGrew and McGrew 2008. 
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Anti-foundationalists, on the other hand, have a different outlook. For 
them justification is not a property that is transmitted from one link in the 
chain to another; rather it emerges gradually from the chain as a whole. In 
the words of Peter Klein: 


Foundationalists think of propositional justification as a property possessed 
autonomously by some propositions which, by inference, can then be trans- 
mitted to another proposition — just as a real property can be transmitted 
from one owner to another once its initial ownership is established. But of 
course, the infinitist, like the emergent coherentist, does not paint this pic- 
ture of propositonal justification. ... [T]he infinitist conceives of propositional 
justification of a proposition as emerging whenever there is an endless, non- 
repeating set of propositions available as reasons. !4 


... the infinitist does not think of propositional justification as a property that 
is transferred from one proposition to another by such inference rules. Rather, 
the infinitist, like the coherentist, takes propositional justification to be what I 
called an emergent property that arises in sets of propositions. > 


However, infinitists and coherentists experience great difficulty in explaining 
emergence. What exactly does it mean to say that justification emerges from 
a chain of propositions? How precisely does justification gradually arise 
from a chain or a web of beliefs? Champions of emergence illustrate their 
views by invoking arresting images, such as Neurath’s boat or Sosa’s raft. 
Although such metaphors are striking and helpful, they fail to inform us how 
exactly emergence can occur. It is one thing to claim that justification can 
emerge, but quite another to come up with a mechanism which explains how 
this can happen. Yet the latter is what we need. When emergence is called 
on to save the day for the anti-foundationalist, an account of the mechanism 
behind it ought to be specified in detail. Without such an account, emergence 
is in danger of being not much more than a name, and the appeal to it runs 
the risk of remaining gratuitous or ad hoc. 

We believe that our concept of probabilistic support can help us here. 
For it carries with it the idea of fading foundations, which explains how 
justification can gradually emerge.! Look again at Table 4.1. It reveals the 
justification as it emerges from an infinite chain of reasons, and as a result 
we see the justification of g materializing in front of our eyes, as it were. The 


'4 Klein 2007a, 16. 

'S Klein 2007b, 26. 

16 Frederik Herzberg also argues that our notion of probabilistic support can help 
explaining emergence (Herzberg 2013). 
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table enables us to give a precise interpretation of what Klein writes about 
justification as seen by infinitists (recall that for Klein doxastic justification 
is parasitic on propositional justification): 


... the infinitist holds that propositional justification arises in sets of proposi- 
tions with an infinite and non-repeating structure such that each new member 
serves as a reason for the preceding one. Consequently, an infinitist would 
seek to increase the doxastic justification of an initial belief — the belief requir- 
ing reasons — by calling forth more and more reasons. The more imbedded the 
initial belief, the greater its doxastic justification.!7 


Thus for Klein justification increases by lengthening the chain. A similar 
idea has been expressed by Jeremy Fantl: 


The infinitist [claims] that, for any particular series of reasons, the degree of 
justification can be increased by adding an adequate reason to the end of that 
series. Infinitism [claims]: ...the longer your series of adequate reasons for a 
proposition, the more justified it is for you.!® 


Our analysis can give a more precise meaning to these claims by Klein and 
Fantl. For it makes it clear that phrases like ‘the emergence of justification’ 
or ‘the increase of justification’ are in fact ambiguous. They can mean that, 
by adding more and more reasons, the value of the unconditional probability 
of g becomes larger and larger. But they can also mean that, by adding more 
reasons, the value of the unconditional probability of g draws closer to its 
final value (relative to the numbers chosen). It is the latter meaning that we 
are talking about here. In Table 4.1 it is the case that, every time we add an 
extra link to the chain, the probability of q rises until it reaches its maximum 
value. A rising value is however not essential for justification to emerge. This 
can be appreciated in Table 4.2, where the conditional probabilities are the 
same as those in Table 4.1, but where the unconditional probability of p is 
0.95. 

As in Table 4.1, in Table 4.2 the justification of g emerges as the num- 
ber of A’s gets bigger, for now q is, as Klein would say, more imbedded. 
However, it is not so that the probability of q rises with each step. As we 


17 Klein 2007b, 26. 

18 Fantl 2003, 554. Fantl defends infinitism on the grounds that, of all the theories of 
justification, it is best equipped to satisfy two requirements: the degree requirement 
(“a theory of the structure of justification should explain why or show how justifi- 
cation is a matter of degree”) and the completeness requirement (“a theory of the 
structure of justification should explain why or how complete justification makes 
sense”) — ibid., 538. That reasoning itself can generate justification has also been 
advocated by Mylan Engel (2014) and John Turri (2014). 


4.3 Emergence of Justification 93 


add more and more reasons, the probability of q gets closer and closer to its 
final value, but numerically it goes down, namely from 0.935 to 0.8. Klein’s 
phrase “[t]he more imbedded the initial belief, the greater its doxastic jus- 
tification” or Fantl’s phrase “the longer your series of adequate reasons for 
a proposition, the more justified it is for you” should therefore be properly 
interpreted. The phrases are correct under the interpretation: the longer the 
chain that justifies the target q, the more reliable the justification of q is, for 
the closer the unconditional probability of q is to its real value. What cannot 
be meant is: the longer the chain that justifies the target g, the greater the 
unconditional probability of g. The justification of g can ascend in reliabil- 
ity while the probability of g descends in numerical value. So we should be 
careful about what we mean when we say that justification emerges: we do 
not mean that the unconditional probability of the target proposition g neces- 
sarily increases numerically, rather we mean that this probability gradually 
moves towards its limit. 

So far we have worked under the assumption that the values of P(p) lay 
strictly between 0 and 1. Indeed, both Tables 4.1 and 4.2 respect this restric- 
tion. However, the assumption is neither necessary for fading foundations 
nor for the emergence of justification. The two tables below illustrate this 
point. 


Table 4.3 Probability of q when the probability of p is 1 
Number of A, 1 2 5 10 25 50 75 100 œ 
Probability of q .981 .971 .947 .914 .853 .814 .804 .801 .8 


Table 4.4 Probability of q when the probability of p is 0 
Number of A, 1 2 5 10 25 50 75 100 œ 


Probability of q .078 .114 .212 .345 .589 .742 .784 .796 .8 


These tables are based on the same uniform conditional probabilities that we 
used before, that is 0.99 and 0.04. However, in Table 4.3 the unconditional 
probability of p is one and in Table 4.4 it is zero. They are extreme values, 
and admittedly they yield strange consequences. For example, if P(p) = 0, 
then p can scarcely be called a reason for q. And if P(p) = 1, then p cannot 
provide probabilistic support for any proposition (this is the root of the infa- 
mous problem of old evidence). Yet the tables reveal how ineffective the röle 
of p is in the long run. For even with a P(p) that is zero, the final probability 
of q is still 0.8; and justification can emerge when the foundation is non- 
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existent. Notwithstanding the extreme values of P(p), the final probability 
of q is the same, and moreover the same as it was in Tables 4.1 and 4.2.!? 

In sum, we have argued that, in a probabilistic model of epistemic jus- 
tification, justification is not something that one proposition or belief re- 
ceives lock, stock and barrel from another. Rather it gradually emerges from 
the chain as a whole. As the distance between the source p and the target 
q increases, the influence of the unconditional probability of p on the un- 
conditional probability of g decreases; in the limit of an infinite chain, the 
probability of q reaches its final value, and the only contributions to this 
value come from the infinite set of conditional probabilities. So when we 
go probabilistic, a law of diminishing returns goes hand in hand with a law 
of emerging justification: the more the justification of the final proposition 
materializes, the less is the influence of the grounding proposition. 


4.4 Where Does the Justification Come From? 


In a finite probabilistic chain, part of the justification comes from the ground 
and part comes from the conditional probabilities that connect the ground to 
the target. If the series is infinite, then all of the justification is carried by the 
conditional probabilities, and none by the ground. One might however still 
be puzzled as to whence the justification comes. If justification does not have 
its origin in a foundation, then where does it come from? How can we make 
sense of there being justification without a ground? 

Most people agree that having justification somehow involves making 
contact with the world; as we said in Chapter 2, to call our beliefs justified 
means acknowledging that they at least remotely indicate how things actu- 
ally are. If one takes the view that contact with the world requires a ground, 
and that a ground is apprehended by a basic belief, and that a basic belief 
involves an unconditional probability, then it is puzzling indeed how infi- 
nite chains can do the job. Such a view would however be unduly restrictive. 
It assumes that notions like ‘applying to the real world’, ‘outside evidence’ 


19 Tf P(p) is zero or one, some of the conditional probabilities are not well-defined 
according to Kolmogorov’s prescription. Alternative approaches to probability the- 
ory exist however, in which conditional probabilities are the basic quantities, and 
we will come back to this in the next section. The important point here is that if 
P(p) = 1 then P(A„) = P(Am|p), and P(A;,|>p), which does not have a Kolmogoro- 
vian definition, is not needed as an ingredient in the regress. Similarly, if P(p) = 0 
then P(A,,) = P(Am|—p), and P(A,,|p) is not needed in the regress. 
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and ‘empirical results’ only makes sense within a framework of basic beliefs. 
This is questionable, since conditional probabilities are just as well equipped 
to carry the empirical burden. 

One might object that conditional probabilities are built up from uncon- 
ditional ones, and that one can only determine their values on the basis 
of unconditional probabilities. Such a complaint has in fact been made by 
Nicholas Rescher: 


There is ...a more direct argument against the thesis that one can never deter- 
mine categorical probabilities but only conditional ones. This turns on the fact 
that conditional probabilities are by definition no other than ratios of uncon- 
ditioned ones P(q|p) = P(g&p)/P(p). So unless conditional probabilities are 
somehow given by the Recording Angel they can be only be determined (or 
estimated) via our determination (or estimation) of categorical probabilities. 
And then if the latter cannot be assessed, neither can the former. 


It is true that, within standard probability theory, conditional and uncondi- 
tional probabilities can be defined in terms of one another. It is also true 
that Kolmogorov himself saw the unconditional probabilities as the basic el- 
ements. However, three considerations should be taken into account here. 
First, one is free to make another choice, and many philosophers have done 
so. Rudolf Carnap, Karl Popper, Alan Hajek — they all plump for condi- 
tional probabilities as the more useful basic quantities. In fact taking condi- 
tional probabilities as primary has certain advantages: one can cover extreme 
cases that cannot be handled if unconditional probabilities are regarded as 
being fundamental.?! Second, we have not claimed that unconditional prob- 
abilities can only be estimated via infinite regresses involving conditional 
probabilities: rather we have shown that they can be computed in that way. 
Third and most important, there is no objection whatever to questioning the 
conditional probabilities in turn. Up to this point we have considered them 
as being given, but that is only a pragmatic stance, motivated by expository 
considerations. It is perfectly possible to unpack the conditional probabilities 
and consider them as targets that are themselves justified by further proba- 
bilistic chains. This possibility will be briefly touched upon in Section 6.4 


20 Rescher 2010, 40, footnote 18 (we adapted Rescher’s notation to ours). 

21 Carnap 1952; Popper 1959; Hajek 2011. Hajek mentions more philosophers who 
made this choice: De Finetti 1974/1990; Jeffreys 1939/1961; Johnson 1921; Keynes 
1921; Rényi 1970/1998. One can define P(g|p) as P(g ^ p)/P(p) only if P(p) £0. 
If one adopts this Kolmogorovian definition, one is unable to make sense of P(q|p) 
when P(p) = 0. The approach of the philosophers mentioned above is free from this 
difficulty. 
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and further explained in Section 8.5. But for the moment we ignore this re- 
finement. 

Two final worries remain. First, how do we know that the conditional 
probabilities in our chain are ‘good’ ones, i.e. make contact with the world? 
What is the difference between our reasonings and those occurring in fiction, 
in the machinations of a liar, or in the hallucinations of a heroin addict? Or, 
applied to our example about bacteria, how can we distinguish the regress 
concerning Barbara and her ancestors from a fairy tale with the same struc- 
ture in which, instead of the inheritable trait T, there is an inheritable magical 
power, M, to turn a prince into a frog? 

The distinction is not far to seek. It lies in the mundane fact that in the for- 
mer, but not in the latter, the conditional probabilities arise from observation 
and experiment. Research on many batches of bacteria have established the 
relevant conditional probabilities, œ and B. These conditional probabilities 
are typically obtained by repeated experiments: they are measured by count- 
ing how many ‘successes’ there are in a given number of trials, and then by 
dividing one number by the other (e.g. the number of bacteria that carry a 
trait, divided by the total number of bacteria in a sample). In the fairy tale, 
on the other hand, the only ‘evidence’ that M is inheritable is contained in 
the story itself — outside the tale there is no evidence at all. When it comes 
to series of infinite length, conditional probability statements are the sole 
bearers of the empirical load. Together they work to confer upon the target 
proposition an unconditional probability that expresses the proposition’s de- 
gree of justification. It is by virtue of the conditional probabilities that an 
infinite chain is not just an arbitrary construct that displays mere coherence, 
but rather can provide real justification, albeit of a probabilistic character. 

We realize perfectly well that this answer will not convince the confirmed 
sceptic, but our opponent after all is a particular kind of foundationalist, not 
the sceptic. We do not have the temerity to aim at refuting the claim that all 
our perceptions might be illusory, or at outlawing evil demon scenarios, old 
and new. We simply assume that there is a real world, and that empirical facts 
can justify certain propositions, or more generally can sanction the probabil- 
ities that certain propositions are true. Here we merely take issue with any 
foundationalist claim to the effect that only basic beliefs or unconditional 
probabilities can be candidates for connecting world and thought. 

That brings us to the second worry. A foundationalist might not be per- 
suaded by the above considerations, arguing that the erstwhile röle of the 
basic belief is now being played by the set of conditional probabilities. In- 
deed, he might claim that we are worse off, for we seem to have traded one 
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basic belief, viz. the remote starting point of the epistemic chain, for an infi- 
nite number of conditional probability statements. 

We do not wantto getinvolved in a verbal dispute here: we are not object- 
ing to a type of foundationalism that acknowledges the empirical thrust of 
conditional probabilities as well as the importance of fading foundations. 
This should not blind us, however, to the difference between conditional 
probabilities and the traditional basic beliefs. The former are essentially re- 
lational in character: they say what is to be expected if something else is the 
case. The latter are by contrast categorical: they say that something is the 
case, or that something can be expected with a certain probability. There is a 
great difference between averring that ‘A, is true’ (or that the probability of 
A, is large) on the one hand, and holding that ‘if A„+ı were true, the prob- 
ability that A, is true would be @’, or ‘if A„+1 were false, the probability 
that A, is true would be B’ on the other hand. Conditional probability talk is 
discourse about relationals and hypotheticals. Our use of an infinite number 
of conditional probabilities amounts to the introduction of an infinite num- 
ber of relational statements. If all these statements satisfy the condition of 
probabilistic support as defined earlier, they can give rise to something that 
is no longer relational, but categorical. This categorical statement can in turn 
become the starting point of a new series of relational statements. And if this 
new series becomes sufficiently long, the influence of the categorical might 
die out, as we have seen. 

The situation is somewhat comparable to what happens in science or 
in logic.”” Scientists typically construct mathematical models on the ba- 
sis of empirical input, and then employ these models to draw new conclu- 
sions about the world. Similarly, logicians make inferences on the basis of 
premises that contain empirical information, thus producing new conclusions 
as output. In both cases, the output can in turn become the input for other 
models and inferences. And in neither case can the machinery work without 
input: logicians need their premises and scientists need their data. Since ev- 
ery assumption that serves as input can itself be questioned in turn, there is 
in this sense a foundation behind every foundation. One may interpret that 
as support for foundationalism (‘there is always a foundation!’) or as sup- 
port for anti-foundationalism (‘every foundation is a pseudo-foundation!’). 
Rather than let ourselves be drawn into such a debate, it might be more fruit- 


22 Gijsbers 2015; Bewersdorf 2015. 
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ful to see what actually happens. And what happens is that a foundation 
becomes less important as it recedes from the target.”” 


4.5 Tour d’horizon 


Let us take stock. The epistemological regress problem, as we have intro- 
duced it in Chapter 1, led to a discussion of epistemic justification in Chapter 
2. The idea that epistemic justification has something to do with ‘probabili- 
fication’ is widespread among contemporary epistemologists: practically all 
agree that ‘A; justifies A,’ at least implies that A; is made probable by Aj. 
Yet, as we have been arguing in Chapters 3 and 4, the far-reaching conse- 
quences of this unanimity about the regress problem in epistemology have 
been insufficiently understood. 

A few exotic cases excluded, talk about probability is Kolmogorovian talk. 
One of the theorems of Kolmogorov’s calculus is the rule of total probability, 
which enables us to determine the unconditional probability of g, namely 
P(q). If P(q) is made probable by an epistemic chain rather than a single 
proposition, then the value of P(q) is obtained from an iterated rule of total 
probability. It has often been thought that such an iteration does not make 
sense if it continues indefinitely, but, as we have seen in Chapter 3, this is 
simply a mistake. In all but the exceptional cases P(q) can be given a unique 
and well-defined value, even if the chain that supports it is infinitely long. 

The iteration in question is a complex formula that consists of two parts. 
The first part is a series involving all the conditional probabilities, the second 
part is what we have called the remainder term, which contains information 


23 The phenomenon of fading foundations is not restricted to probabilistic chains 
in epistemology; it can be proved (although we will not do that here) that it also 
applies in modified form to infinite chains of propositions that are ranked in the 
sense of Spohn (Spohn 2012). Moreover, fading foundations occur in non-epistemic 
causal chains, as long as ‘causality’ is interpreted probabilistically. This fact may 
shed light on various philosophical debates, such as the one on rigid designators, 
i.e. expressions that denote the same object in every possible world. The objects 
themselves, at least for Saul Kripke, are identified by following causal chains back- 
wards to the moment of baptism when they received their names. Gareth Evans 
noted a problem with this view: we can use proper names even if the causal chains 
are broken (his Madagascar-example in Evans 1973). In Addendum (e) to Nam- 
ing and Necessity Kripke comments that he leaves this problem “for further work” 
(Kripke 1972/1980, 163); but with a probabilistic conception of causality Evans’ 
problem disappears since the röle and character of rigid designators change. 
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about the probability of the grounding proposition. What in this chapter we 
have called fading foundations arises if and only if the following two require- 
ments are fulfilled: 


1. the series involving the conditional probabilities converges 
2. the remainder term goes to zero. 


The first requirement is always fulfilled if the condition of probabilistic sup- 
port has been satisfied for the entire chain; that is, if P(A;|A;) > P(Aj|7Aj) 
for all the links. The second requirement is only fulfilled if we are dealing 
with what we have been calling the usual class, i.e. the class of probabilistic 
regresses that are benign. Informally, this means that the conditional prob- 
abilities must not tend too quickly to those appertaining to an entailment. 
Formally, it means that they comply with 


c>0 & IN>c: Vn>N, 1-% >>. 


Whereas conditional probabilities that obey this constraint belong to the 
usual class, those that violate it make up the exceptional class. The latter we 
also call the class of quasi-bi-implication. The conditional probabilities in 
this class resemble bi-implications, and they fail to meet the above asymp- 
totic constraint. From this it follows that whenever we are dealing with a 
probabilistic regress in which the conditional probabilities are of the usual 
class, fading foundations will ensue. Indeed, the necessary and sufficient 
condition for fading foundations is membership of the usual class. 

Despite the technicalities we needed to prove it, the result itself is actu- 
ally very intuitive. If the conditional probabilities in a regress are very close 
to those corresponding to entailments, then we can only determine the truth 
value of the target if we know the truth value of the ground. Irrespective of 
the chain’s length, and thus irrespective of whether the ground is very close 
to the target or is far removed from it, the ground continues to make a con- 
tribution, and then the age-old regress problem rears its ugly head. But if the 
regress contains genuine conditional probabilities, i.e. conditional probabili- 
ties that do not resemble implications, then the remainder term goes to zero, 
and the regress is benign. 

Strictly speaking, as we noted in Chapter 3, footnote 29, in the usual 
class probabilistic support is not needed for convergence. But probabilistic 
support is important for three reasons. First, we are interested in epistemic 
justification, and this contains probabilistic support as a necessary element. 
Whatever it may mean to say that ‘A; justifies A;’, part of its meaning is 
that P(A;|A;) > P(A;|7A;). Second, we like to see epistemic justification as 
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something that amounts to striking a balance. In justifying our beliefs, we 
set up a trade-off between the number of reasons that we can handle with 
our finite minds and the level of accuracy that we want to reach. As we will 
explain in the next chapter, probabilistic support is needed for such a view 
of justification as a trade-off. Third and finally, the condition of probabilis- 
tic support is needed for the convergence of the networks that we discuss in 
Chapter 8. 
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Chapter 5 
Finite Minds 


Abstract 

Can finite minds encompass an infinite number of beliefs? There is a dif- 
ference between being able to complete an infinite series and being able to 
compute its outcome; and justification is more than mere calculation. Yet the 
number of propositions or beliefs that are needed in order to reach a desired 
level of justification for the target can be determined without computing an 
infinite number of terms: only a finite number of reasons are required for any 
desired level of accuracy. This suggests a view of epistemic justification as a 
trade-off between the accuracy of the target and the number of reasons taken 
into consideration. 


5.1 Ought-Implies-Can 


As in the past, the idea of infinite epistemic chains is still generally regarded 
as being nonsensical, and often for the same reasons. Scott Aikin has divided 
the various objections to infinite chains into two main categories: the ought- 
implies-can arguments, which are basically pragmatic in character, and the 
conceptual arguments.! In this chapter we deal with the first category; the 
conceptual arguments we will discuss in the next chapter. 
Ought-implies-can arguments in effect contain all the different versions of 
the notorious finite mind objection, which was already raised by Aristotle. 
They imply that justifying our beliefs only counts as an obligation in so far as 
we are capable of doing so. Given our human finitude we cannot complete an 


! Aikin 2011, Chapter 2. 
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infinite series of inferential justification, hence we are not obliged to perform 
this task. Aikin distinguishes two kinds of ought-implies-can arguments: 


On the one hand, there are arguments that the quantity of beliefs (and infer- 
ences) necessary is beyond us (for various reasons). This is the argument from 
quantitative incapacity. On the other hand, there are arguments that the quality 
(or kind) of belief necessary to complete the regress appropriately is one we 
simply cannot have. That is, because some belief in or about the series (and 
necessary for the series to provide epistemic justification) will be so complex, 
we cannot have it. And thereby, we cannot maintain the series in a way capable 
of amounting to epistemic justification. This is the argument from qualitative 
incapacity.” 


The idea is straightforward enough: because we are mortal and of restricted 
capacity, we are unable to handle epistemic chains that either contain an 
infinite number of beliefs or contain some beliefs that are too complicated 
for us to handle. 

But straightforward as it may seem at first sight, the idea is not always 
clear, and it has not always been expressed in the same way. Even among 
the philosophers who are most pertinacious in their disapproval of infinite 
epistemic chains, there is no agreement on this matter. For example, Michael 
Bergmann, as we have seen, deems it obvious that we cannot have an infinite 
number of beliefs: 


...it seems completely clear that none of us has an infinite number of actual 
beliefs, each of which is based on another... 


Noah Lemos agrees: 


One difficulty with [the option of an infinite chain] is that it seems psycholog- 
ically impossible for us to have an infinite number of beliefs. If it is psycho- 
logically impossible for us to have an infinite number of beliefs, then none of 
our beliefs can be supported by an infinite evidential chain.* 


But Richard Fumerton has a different opinion: 


There is nothing absurd in the supposition that people have an infinite number 
of justified beliefs.° 


? Ibid., 52. The same distinction was made by John Williams when he discriminated 
between an infinite number of beliefs and an infinitely complex belief (Williams 
1981). 

3 Bergmann 2007, 23. 

* Lemos 2007, 48. 

> Fumerton 2006, 49. 
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Klein is right that we do have an infinite number of beliefs.° 


... there probably is no difficulty in supposing that people can have an infinite 
number of beliefs.’ 


This difference of opinion should perhaps not surprise us. After all, as noted 
earlier, it is entirely unclear how we should count our beliefs. This observa- 
tion already intimates that knock-down arguments whether we can or cannot 
have an infinite number of beliefs are not to be expected. 

Peter Klein has defended his infinitism against the finite minds objection 
by arguing that the objection is based on what he calls the “Completion Re- 
quirement’. According to this requirement, a belief can be justified for a 
person only if that person has actually completed the process of reasoning to 
the belief. Such a requirement, says Klein, is against the spirit of infinitism 
indeed, but it is also unrealistic in that it is too demanding: 


Of course, the infinitist cannot agree to [the Completion Requirement] be- 
cause to do so would be tantamount to rejecting infinitism. More importantly, 
the infinitist should not agree because the Completion Argument demands 
more than what is required to have a justified belief.® 


Klein regards epistemic justification as being incomplete at heart: it is es- 
sentially provisional and can always be further improved. He fleshes out this 
view by means of two distinctions: the distinction between propositional and 
doxastic justification, and that between objective and subjective availabil- 
ity. Propositional justification, according to Klein, depends on the objective 
availability of reasons in an endless chain, where objective availability means 
that one proposition is a reason for another, so that it can be said to justify 
even if we are not aware of it. Doxastic justification, on the other hand, 
is parasitic on propositional justification and hinges on an availability that 
is subjective: a belief q is doxastically justified for an epistemic agent S if 
there is, in the endless chain of reasons, a reason for q that S can “call on”. 
Although in its entirety the chain can never be subjectively available to S’s 
finite mind, S can take a few steps on the endless path. How many steps S 
can take, or needs to take in order to reach doxastic justification, all depends 
on contextual factors: 


Infinitism is committed to an account of propositional justification such that 
a proposition, q, is justified for S iff there is an endless series of non-repeating 


6 Fumerton 2001, 7. 
7 Fumerton 1995, 140. 
8 Klein 1998, 920. 
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propositions available to S such that beginning with q, each succeeding mem- 
ber is a reason for the immediately preceding one. It is committed to an ac- 
count of doxastic justification such that a belief is doxastically justified for S 
iff S has engaged in tracing the reasons in virtue of which the proposition q 
is justified far forward enough to satisfy the contextually determined require- 
ments.’ 


We sympathize with Klein’s view, but the previous chapters have made it 
clear that our position differs in two ways. On the one hand it is weaker: 
where Klein holds that justification requires the objective availability of an 
infinite chain, we allow that there can be justification even if the chain termi- 
nates. In those cases the foundation still exerts some justificatory influence 
of the target; and just how much justificatory influence it exerts depends on 
other characteristics of the chain, such as its length and the speed with which 
the series of conditional probabilities converges. On the other hand, our po- 
sition is stronger than that of Klein: where he denies that infinite chains can 
be completed, we assert that they can. We only need to construe justifica- 
tion probabilistically and make sure that we are in what we have called ‘the 
usual class’, i.e. the domain where the probabilistic support is not too close 
to entailment.!° 


? Klein 2007a, 11. We have substituted q for p. Cf. Section 1.2. 

10 While some have taken the view that Klein’s infinitism can account for proposi- 
tional but not for doxastic justification, Jonathan Kvanvig has argued it fails on both 
counts. His argument why it fails for propositional justification goes as follows. In 
Klein’s view, propositional justification either is relative to the total evidence avail- 
able or is not so relative (where ‘available’ is interpreted liberally: a reason need not 
be present in order to be available, but may be only ready to hand). If propositional 
justification is not relative to the total evidence available, then my justification for q 
might depend on which book I happen to have taken from my shelves: “one source 
can be the start of an infinite chain of reasons for thinking [q], and the other source 
the start of an infinite chain for [~q]” (Kvanvig 2014, 140). If, on the other hand, 
propositional justification is relative to the total evidence available, then scepticism 
looms. Suppose that evidence E; confirms q, that E2 confirms ~q, and that Ei A E2 
does not confirm q. Let person Sı have E as evidence, S2 have E2, and the infini- 
tist have FE) A Ey. Then, Kvanvig argues, “if propositional justification is relative to 
total information”, none of these three have justification for q or for =q. Kvanvig’s 
argument rightly points to vagueness in the term ‘availability’, whether interpreted 
liberally or strictly. However, his argument seems to presuppose an ‘absolute’ con- 
cept of justification and moreover to equate justification with confirmation. With 
the relational concept of justification that we proposed in Chapter 2, and with the 
assumption that confirmation is necessary but not sufficient for justification, there 
does not seem to be a problem. 
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5.2 Completion and Computation 


On the basis of the previous chapters our answer to the finite mind objection 
will not come as a surprise. If justification is probabilistically construed, then 
even the ‘Completion Requirement’ that Klein rebuts can be met.!! For then 
infinite justificatory chains can indeed be completed in the sense that they 
yield a unique and well-defined probability value for the target proposition. 
And if it is possible to complete infinite chains, the finite mind objection 
does not arise. Although this answer to the finite mind objection differs from 
that of Klein, who after all asserts that completion and infinitism are irrec- 
oncilable, it does enable us to account for at least two of Klein’s intuitions, 
namely that epistemic justification gradually emerges along the chain and 
that contextual factors decide at which level of emergence we will decide 
that ‘enough is enough’.!? 

However, Jeremy Gwiazda has argued that this reply to the finite mind 
objection does not work. As he sees it, we have not completed a probabilistic 
regress, but we have only computed its limit.!> There is a great difference, 
according to Gwiazda, between calculating the probability value of a target 
proposition on the one hand and actually giving reasons for that proposition 
on the other. Gwiazda does not discuss in detail what the differences are, 
but he might be thinking of a difference in time: while we can calculate 
the limit of an infinite series in a finite time, we are unable to come up, 
in a finite time, with an infinite number of reasons. As such, the difference 
resembles an important distinction that Nicholas Rescher has emphasized, 
namely between regresses which are time-compressible and those which are 
not. An example of the former is generated by the Zeno-like thesis “To reach 
a destination, you must first reach the halfway point to it’; an example of the 
latter is produced by “To make a journey to a destination, you must first make 
a journey to the halfway point to it’: 


The first thesis is true — and harmless: that is just how transit from point A to 
point B works. But the second is false and, moreover, vicious in rendering any 
sort of journey impossible. Zeno of Elea notwithstanding, a motion to reach or 
to cross endlessly many points is perfectly possible. But infinite journeying, 


11 This point appears to have been missed in Wright 2013. 

12 This is basically the way in which Frederik Herzberg, referring to insights about 
probabilistic regresses, has replied to the ‘new finite mind objection’ that was raised 
by Adam Podlaskowski and Joshua Smith. See Herzberg 2013, 373-374, and Pod- 
laskowski and Smith 2011. 

13 Gwiazda 2010. The same point was made by Matthias Steup (Steup 1989). 
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with its inherent requirement for explicitly planned and acknowledged tran- 
sits, is an impossibility. And the reason for this lies not in the impossibility of 
motion, but in the fact that making a journey to somewhere (as distinct from 
reaching or arriving there) involves deliberation and intentional goal-setting. 
And since man is a finite being, an infinitude of conscious mental acts is im- 
possible for us. So while that first structural regress is harmless, the second 
regression of infinitely many consciously performed acts is an impossibility.'* 


In the same vein, it could be admitted that we, with our finite minds, are ca- 
pable of calculating the probability of a target proposition (in the previous 
chapters we have after all done so), but are incapable of giving an infinite 
number of reasons for this proposition, since the latter would require an in- 
finity of consciously performed acts. Because epistemological justification 
is about giving reasons, and not about making calculations, the finite mind 
objection applies in full force. 

A similar reaction to our views has been voiced by Adam Poslaskowski 
and Joshua Smith.!> They argue that, although “valuable lessons” can be 
drawn from our formal results, it is “entirely unclear” that these results 
meet a basic requirement, namely “providing an account of infinite chains 
of propositions qua reasons made available to agents”.!© Podlaskowski and 
Smith call this ‘the availability problem’: 


Given the distinctive emphasis that Peijnenburg, Atkinson, and Herzberg 
place on calculability, we have doubts about the extent to which (on their 
account) an infinite chain of propositions can serve as reasons that are avail- 
able to an agent. (This is what shall be called the availability problem facing 
the distinctive brand of infinitism under consideration).!7 


...it is hard to see, more generally, how the emphasis on calculability yields 
a notion of available reason (or availability) that can serve the infinitist’s 
purposes. !8 


Podlaskowski and Smith maintain that our analysis confuses two completely 
different things, namely being able to compute the probability of a target 


'4 Rescher 2010, 25. Rescher uses several ways to express the distinction between 
time-compressible and non-time-compressible regresses; one of them is by saying 
that the latter need pre-conditions whereas the former only has co-conditions (ibid., 
55). 

15 Podlaskowski and Smith 2014. 

16 Tbid., 212, 

17 Tbid., 214. 

18 Tbid., 215. 
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proposition on the one hand and having available reasons for this propo- 
sition on the other. They blame us for assuming that, “since mathematical 
means exist with which an agent can decide the probability of any propo- 
sition being true (even if it belongs to an infinite series), all the members 
of an infinite chain of reasons must thereby be available (as reasons) to an 
epistemic agent”.!? Like Gwiazda, they stress the difference between deter- 
mining the probability of a target g and showing that something is a reason 
for q: 


deciding the probability of any given proposition ...even if there are infinite 
chains of propositions ...is still a far cry from showing that, as a matter of 
principle, each proposition in a chain of of propositions is one that can serve 
as a reason for another proposition in that chain, and do so in the right order. It 
appears that two dispositions have been conflated: those to make a certain sort 
of calculation, and those to accept any given proposition as reason for another 
proposition. ...[A] demonstration that finite agents can actually calculate the 
probability of a proposition’s truth — even if it belongs to an infinite chain of 
reasons — does not thereby show that each reason is equally available to a 
finite agent.?° 


The observation of Gwiazda and Podlaskowski and Smith that computing 
and completing reflect two different dispositions is fair enough. However, 
as we will explain in the next section, in epistemic justification we draw on 
both. In this sense, justification resembles logic: there, too, we draw on an 
abstract, normative dimension concerning how one ought to reason, and a 
concrete, descriptive dimension concerning how one reasons in fact.?! To- 
gether the two dimensions suggest a view of justification as a trade-off be- 
tween the accuracy of the target proposition and the capacity of our mental 
housekeeping. 


5.3 Probabilistic Justification as a Trade-Off 


Rescher is of course right that a time-compressible regress is different from 
a regress that is not time-compressible. And Gwiazda and Podlaskowski and 
Smith are right that making a calculation is not the same as giving a proposi- 
tion as reason for another proposition. The skill to compute the value of the 


19 Thid., 215. 

20 Tbid., 216. Michael Rescorla’s complaint that our approach falls prey to ‘hyper- 
intellectualism’ expresses a similar sentiment (Rescorla 2014). 

21 Van Benthem 2014, 2015. 
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target on the basis of a probabilistic epistemic chain indeed differs from the 
capacity to have the propositions in the chain available as reasons. 

However, these two faculties are not disjunct, as the above authors seem to 
think.?? Especially when it comes to epistemic justification, of which proba- 
bilistic support is an essential part, these faculties are closely and essentially 
connected. A justificatory regress is not just any old regress; it is a regress 
about reasoning, in this case reasoning that involves how a proposition or 
belief is probabilistically justified by another. This means that the actual 
process of ‘giving probabilistic reasons’ is to a certain extent subjected to 
the rules of the probability calculus, just as the actual process of ‘giving de- 
ductive reasons’ is to a certain extent subject to the rules of deductive logic. 
The aversion of Gwiazda and others to using calculations in the context of 
giving reasons might be exacerbated by the idea that this necessarily involves 
processing an infinite number of terms. That idea, although understandable, 
is however mistaken, and betrays a misconstrual of our view. 

We have argued that, whenever we give a reason, Aj, for a target q, the 
significance of A; as a reason depends on how much probabilistic support it 
gives to q. The latter in turn depends on how much A;’s support for q deviates 
from the ‘final’ support, i.e. the support that g would receive from the entire 
justificatory chain of which A; is a member. And how much support q re- 
ceives from the entire infinite chain depends on the chain’s character, i.e. on 
the values of its conditional probabilities together with the value of the un- 
conditional probability of the ground, p. While the conditional probabilities 
come from experiments, the unconditional probability of the ground is un- 
known.” The longer the chain, the smaller the contribution from the ground, 
and when the chain is infinitely long, the contribution from the ground to the 
target vanishes completely, leaving all the justificatory support to come from 
the combined conditional probabilities. 

The view could be easily misunderstood. It does not imply that ‘giving 
reasons’ depends on ‘making calculations’ in the sense that we first have 
to calculate the limit of a probabilistic regress before we can know what our 
reason is worth; computing the limit is not necessary for weighing the quality 
of our actual reasons. Rather, the structure of the probabilistic justificatory 


22 Recall the claim of Podlaskowski and Smith that it is “entirely unclear” what 
formal calculation means for “propositions gua reasons made available to agents” 
(Podlaskowski and Smith 2014, 212). 

23 In Chapter 8, Section 8.5, we will come back to the status of the conditional 
probabilities. In particular, we consider the situation in which they are not given, 
but are themselves in need of justification. As we will explain, a network is then 
created with a remarkable structure that resembles a Mandelbrot fractal. 
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chain is such that it enables us to say how many reasons we need to call on 
in order to approach the probability of the target to a satisfactory level. To 
do that, we do not need to know the length of the chain; we need not even 
know whether it is finite or infinite. Nor do we have to know the probability 
of the ground. The only thing we need are the values of a certain number 
of conditional probabilities (sometimes more, sometimes less, depending on 
the speed of the convergence) that suffice to take us to within a desired level 
of accuracy with respect to the true, but unknown probability of the target. 
Once we are there, we can safely ignore the rest of the chain — such is the 
lesson of fading foundations. 

An example might help to understand the point. Imagine I have a rea- 
son A; for my belief q, and know the two relevant conditional probabilities, 
P(g|Aı) and P(q|=A,). Suppose I am unable or unwilling to back up A; by 
a further reason, and therefore want to cut off the chain here. We have seen 
that knowing the conditional probabilities is in general not enough to know 
the value of P(q); especially with short chains like the one at hand it is in- 
dispensable that we also know the unconditional probability P(p). Even if I 
have no clue what the value of the latter is, I do know that it cannot be greater 
than one and cannot be smaller than zero. I now consider these two extremal 
cases, i.e. where P(p) = 1 and where P(p) = 0, and I find that in the first case 
P(q) =x and in the latter case P(q) = y. The condition of probabilistic sup- 
port now guarantees that the real value of P(q) lies in the interval between 
x and y, no matter how many further An we take into consideration. What is 
more, the condition ensures that with every reason we add, the interval will 
become smaller, making the value of P(q) more precise with each step. This 
applies both in the uniform situation, where the conditional probabilities are 
all the same, and in the nonuniform case, where they are different. 

As a result, I can determine how many reasons I need to have in order to 
approach the true probability of the target g within an error margin of, for 
example, 1%. If this number of reasons happens to be too large to fit into 
my finite mind, then I will have to relax the level, and be content with a 
degree of justification that is further away from the true probability of the 
target. But if the number of reasons is rather small, so that they all fit in 
my finite mind (although perhaps not in that of my four-year-old daughter), 
then I can always tighten up the satisfaction level, and come closer to the 
target’s true probability. Epistemic justification thus boils down to striking 
a balance. In acting as responsible epistemic agents, we are instigating a 
trade-off between the number of reasons that we can handle and the level of 
accuracy that we want to reach. If we are unable or unwilling to manage a 
large number of reasons, we have to pay in terms of a lack of precision and 
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hence of trustworthiness. Taking the short route thus comes at a price, but in 
situations where precision is not important, we can take it easy and should 
do so on pain of exerting ourselves unnecessarily. 

Let us spell out this idea of a trade-off more fully and more formally. 
Assume a finite chain to consist of five propositions, the target proposition 
q, the intermediate propositions A, to A3, and the ground A4: 


P(q) = Bo + %Bi + WN B2 + Wn hB + WN KWY3P(A4) - (5.1) 


As we explained in Sections 3.5 and 3.6, the right-hand side consists of two 
terms. The first term is the sum of the conditional probabilities, 


Bo + Bi + wn B2 + nnp, 


and the second is the remainder term, 


YyıypyaP (Aa): 


This remainder term is a product of two factors, %yıypy and P(A4). Since 
we suppose the conditional probabilities to be known, there is only one prob- 
ability that we need to know in order to compute P(g). This is P(Aq), i.e. 
the unconditional probability of the ground. If we did know P(A4), then we 
would know P(q). 

However, suppose we have no clue as to the value of P(A4). What to do? 
Because of the condition of probabilistic support, (2.1), all the y, are positive, 
which means that every term in (5.1) is positive too. Therefore the smallest 
value that P(q) could have, given the conditional probabilities, is obtained by 
giving P(A4) the minimum value that it could have, which is zero, leaving 
only 


Bo + WB1 + wn B2 + Nn VBS - (5.2) 


On the other hand, the largest value that P(q) could have is obtained by 
giving P(A4) the maximum value that it could have, which is one, yielding 


Bo + Bi + HN + Hn YB: + N73 - (5.3) 


We know that the value of P(q) must lie somewhere between the two ex- 
tremes (5.2) and (5.3). If we were to assume the value of P(q) to be one 
extreme, for example (5.2), then we would be sure that our error could not 
be larger than the difference between the maximum, (5.3), and the minimum, 
(5.2), namely Yi 273- 

Now imagine that the error term %Yı 7273 turns out to be, for example, only 
1% of the minimum value (5.2). And suppose further that we proclaim our- 
selves satisfied with a value that deviates by no more than 1% from the true 
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value of P(g). Then we need go no further in inquiring as to any support that 
the ground, A4, might have from some other proposition. This is because any 
extension of the chain, obtained by adding a proposition, As, that supports 
the erstwhile ground A4 would only increase the minimum (5.2) to 


Bo + YBi + HN B2 + WN RB + Nn NKY3Bs » 


and decrease the error to 11727374 (this is smaller, because the extra factor, 
Ya, is less than one). This is precisely what fading foundations imply. So in 
this case we know exactly how many reasons we need in order to approach 
the true value of the target to a level that satisfies us. If we are content with 
a value that deviates no more than 1% from the true value of P(q), then 
we require no more than four reasons for q, namely A; to A4. And if our 
mind is big enough to store these four reasons, then we have accomplished 
our task: we have justified q to a satisfactory level, staying neatly within the 
limitations of our finite mind. Note that we have performed our task without 
knowing the true value of P(q) or that of P(A). 

What to do when the error term Y7i 273 turns out to be very big, for 
example 90% of the minimum value (5.2)? How should we proceed now? If 
our level of required accuracy is still 1%, then there is not much that we can 
do in this case. We might sadly conclude is that there is much uncertainty, 
due to the fact that the justificatory influence of the unknown P(A4) on P(q) 
is very great, but that is as far as we can get. For the four reasons that we can 
avail ourselves of, A; to A4, are of little help: jointly they bring us to a point 
where the deviation from the true value of P(g) may be as great as 90% . 

However, let us now make the finite chain considerably longer. Rather 
than assuming that there are four reasons for g, let us suppose that there are 
one hundred: 


P(q) = Po + %Bi + WNB2+---+ WV --- Ym-ıBm + YON - - - YmP(Am+ı) ; 
(5.4) 
where m = 99. It is unlikely that I can store all these reasons in my finite mind, 
so I decide to cut off chain (5.4) at number seven, making a provisional stop 
at proposition Ag. So I get: 


P(q) = bo + Bi + WB +... + Yo RB UBs + HnNLBUYsP (Ao). (5.5) 


In formula (5.5) I can only compute P(q) if I know P(As). Since I have no 
idea as to the value of the latter, I apply the same reasoning as above. That 
is, I first recall that the value of P(q) must lie between two extremes. The 
one extreme is obtained by putting the unknown P(A6) equal to zero. The 


112 5 Finite Minds 


other extreme is obtained by putting it equal to one. Suppose that I adopt the 
first extreme, P(A¢) = 0, so my estimation of P(q) is that it has its minimum 
value. I know that my error in making this estimation cannot be larger than 
the difference between this minimum value and the maximum value of P(g), 
obtained by putting P(A¢) = 1. The difference itself is given by our error 
term, which in this case is Wy YYYY. 

Now suppose that the error term WYı Y% Y4Y%s is only 1% of the minimum 
value of P(q). And suppose again that I am satisfied with an accuracy that 
deviates no more than 1% from the true value of P(q). If I am capable of 
storing six reasons in my head, then I am done. In particular, I do not have to 
go on and find a justification for Ag. A we have seen, the reason for this lies 
in the fact that, as the chain lengthens, the minimum value of P(q) increases 
and the maximum value decreases — which is a direct consequence of the 
condition of probabilistic support. This condition implies that any extension 
of the chain would only make the minimum value of P(q) greater, and thus 
would make the error term itself smaller.”* Consequently, adding a propo- 
sition to chain (5.5), for example proposition A7, would bring us closer to 
the true value of P(q); and since we are already satisfied with our level of 
approximation, there is no need to engage in this project. We have in fact 
reached the point where ‘enough is enough’, and this expression now has a 
very precise meaning. For any justificatory chain, I can first define the level 
of accuracy within which I want to approach the true value of P(q), and I 
can then determine how many reasons I need to reach this level. In order to 
perform these tasks, I need not know the value of P(g), nor that of P(A6), 
nor that of any other ground. More importantly, I can blissfully neglect the 
rest of the chain. For not only is it so that I am within the desired 1% of the 
true probability value, it is also the case that calling on any further reason 
will only bring me closer to that true value. As the chain gets longer, the 
remainder term gets smaller (in accordance with fading foundations) and the 
sum of the conditional probabilities gets larger (in accordance with the con- 
dition of probabilistic support). So as m gets bigger, the value of the sum of 
the conditional probabilities increases monotonically, whereas the remain- 
der term decreases monotonically. Therefore, if we are already satisfied with 
1%, any extension of the chain will bring us still closer to the true value of 
P(q); there is thus no need to call on more reasons than the six reasons that 
we have (subjectively) available. 

But now suppose that the error term 712737475 in formula (5.5) still 
greatly differs from the minimum value of P(g); let us now say by 80%. The 


24 See Appendix A.2 for the proof. 
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situation is not the same as it was in the case (5.1). Since Eq.(5.5) is part of 
a larger chain we have the option to go on, and to look for the justification of 
Ag in terms of A7; after that, we can go further and justify A7 in terms of Ag, 
and so on. The more propositions we add, the more we lengthen our chain, 
and the smaller will be the difference between (5.4) and the minimum value 
of P(q). We are now able to reduce the error to less than 80% of the true 
value of P(q). However, there is a price to pay. In getting closer and closer 
to the real value of P(q), we are calling on more and more reasons, and our 
finite minds have to accommodate each and every extra reason that we call 
on. It could happen that our minds lack the capacity to take in all the reasons 
that our level of accuracy requires. In that case the only option left open for 
us is to relax the accuracy level to a degree where it corresponds to a number 
of reasons that can be housed in our heads. We are committed to a trade-off: 
we simply cannot have our cake and eat it too. 

What we have said above is of course not restricted to finite chains such 
as (5.1), (5.4), and (5.5). The reasoning about error terms works just as well 
with an infinite chain as with a finite chain. In both cases we can work out, 
in a finite number of steps, how many terms we need to reach a particular, 
pragmatically determined level of accuracy. If it turns out that our level of 
accuracy requires more reasons than we can accommodate, then we are living 


25 Whether a particular number of reasons can or cannot be housed in our heads 
might depend not just on size or on capacities, but also on other factors. Linda 
Zagzebski has distinguished between two kinds of epistemic reasons for believing a 
proposition q: theoretical reasons, which are third personal and “connect facts about 
the world with the truth of [g]”, and deliberative reasons, which are first personal and 
“connect me to getting the truth of [g]” (Zagzebski 2014, 244). Even if, impossibly, 
we were able to complete our search for theoretical reasons, that would still leave 
us with the second problem that what we call ‘reasons’ may not indicate the truth: 
“We would still need trust that there is any connection between what we think are 
the theoretical reasons and the truth” (ibid., 250). Zagzebski argues that this second 
problem can only be solved by calling on a deliberative reason with a special status, 
viz. epistemic self-trust, which ends our urge to search for further theoretical or 
deliberative reasons. It is not excluded that Zagzebski’s epistemic self-trust might 
be a factor in the process of trading-off. Other possible factors might perhaps be the 
localist considerations of Adam Leite, or the “plausibility considerations” that Ted 
Poston mentions in support of his claim that “there is more to epistemic justification 
than can be expressed in any reasoning session” (Leite 2005; Poston 2014, 182- 
183). We expect that our trade-off can also be combined with Andrew Norman’s 
“dialectical equilibrium” (Norman 1997, 487) and Michael Rescorla’s “dialectical 
egalitarianism” (Rescorla 2009), although we are not sure if the authors themselves 
would agree. 
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beyond our means. We then should either work harder and try to create more 
space in our finite minds, or become more modest and lower our desire for 
accuracy. Allthis can be done without having to call on, or even to calculate, 
all the terms in a (finite or an infinite) series. 

The two tables below illustrate the idea. In the first, the conditional prob- 
abilities, œ and ß, have the values 0.99 and 0.04 respectively; in the second, 
they are 0.95 and 0.45. ‘Maximum P(q)’ and ‘Minimum P(q)’ refer to the 
values that P(g) has when P(p) is one or zero, respectively. 


Table 5.1 Extremal values of P(g) when œ = 0.99 and B = 0.04. 


Number of A, 1 2 5 10 15 25 50 100 œ 
Minimum P(g) .078 .114 .212 .345 .448 .589 .742 .796 .8 
Maximum P(g) .981 .971 .947 .914 .888 .853 .815 .801 .8 


Table 5.2 Extremal values of P(g) when œ = 0.95 and B = 0.45. 


Number of A, 1 2 3 4 5 6 8 10 œ 
Minimum P(q) .675 .788 .844 .872 .886 .893 .8982 .8996 .9 
Maximum P(g) .925 .913 .906 .904 .902 .901 .9002 .9000 .9 


In the first table one needs more than fifty intermediate reasons A, to ensure 
that the difference between the maximum and the minimum of P(q) is rel- 
atively small, whereas in the second table a similar uncertainty is already 
reached after a mere three reasons A,. There the situation is much more 
amenable. Justification as a form of trade-off sheds light on the difference 
between propositional and doxastic justification that we discussed in 4.2. 
Some scholars appear to be of the opinion that propositional and doxastic 
justification can never be combined, since the former is abstract and infinite, 
while the latter is concrete and finite by definition. 

Others have however argued that doxastic justification is parasitic on 
propositional justification, and that the context determines when exactly it 
comes to an end. Our considerations in this section clarify the latter position, 
and they make clear how this contextualism can be interpreted. 
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When commenting on our approach, Podlaskowski and Smith write that 
“care must be taken when assessing the significance of these formal re- 
sults”.?° Of course we agree, and it can be added that the same applies to 
assessing results that are not formal: whenever we informally discuss rea- 
soning, or justification, or probability, we must take care what we say. For 
example, as we have seen, it is incorrect to say that an infinite probabilistic 
regress yields zero for the target, or that knowing the value of the target re- 
quires knowing the value of a basic belief. Intuitive as these claims might be, 
they are incorrect as they stand. 

The difference of opinion between Podlaskowski and Smith and us, if 
there is one, concerns the relation between the ability to calculate and the 
ability to give reasons. As we explained in the previous section, we believe 
that epistemic justification involves both. Podlaskowski and Smith seem 
however to interpret us differently, thinking that for us having the mathe- 
matical ability to calculate is sufficient for having justification. This is for 
instance the message from their instructive example about Carl, who is a real 
pundit when it comes to calculating probabilities, but who cannot understand 
the meaning of reasons: 


[I]magine Carl, whose impressive talent in calculating conditional probabili- 
ties is strangely at odds with his ability to grasp various concepts. Carl has no 
problem solving all manner of complex equations, including those involving 
conditional probabilities (such as Peijnenburg, Atkinson, and Herzberg pro- 
vide). Yet, there are various concepts which he is entirely incapable of grasp- 
ing, some of which might feature in reasons whose probabilities of being true 
are conditional on other reasons. Suppose that Carl is given two lists, an infi- 
nite list of conditional probability assignments and an infinite list of reasons. 
Unbeknownst to Carl, the two lists correspond perfectly: the list of probabili- 
ties is meant to capture the probability of each reason being true, conditional 
on its predecessor. Moreover, some of the members of the list of reasons are 
comprised of those concepts that Carl is incapable of grasping. Even if Carl 
were capable of working through some infinite list of reasons, at some point 
on the list at hand, Carl would fail to comprehend the concepts deployed. But 
he would have no problem doing the corresponding calculations. Does merely 
calculating the probability of the chain make Carl justified in holding any of 
those beliefs, when Carl is incapable of understanding the concepts on which 
those beliefs depend? Surely not. If an agent cannot understand some of the 


26 Podlaskowski and Smith 2014, 212. 
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reasons in the infinite chain, it is difficult to see how those reasons can do any 
justificatory work for him.” 


Podlaskowski and Smith suggest that, according to us, Carl has justified his 
beliefs. This is however not so: for us, as for Podlaskowski and Smith, Carl 
fails to justify. Our view is not that calculation implies justification, but that 
justification implies a certain amount of ‘calculation’. Of course we realize 
that people often put forward probabilistic reasons for their beliefs without 
knowing anything about the probability calculus. As epistemologists who 
want to take the concept of probabilistic reasoning seriously, however, we 
believe that a minimum of adjustment to the probability calculus seems to be 
required, even if it is only in a rational reconstruction. 

Podlaskowski and Smith seem to have anticipated this response when they 
write: 


One might . . . suspect that we have crafted the Carl case too narrowly, and that 
it misses some important aspect of what mathematical analyses of probabilis- 
tic regresses are supposed to be doing.”® 


However, they then suggest that our response requires a new notion of ‘avail- 
able reason’ which cannot be developed within our approach: 


Perhaps there is a notion of available reason that can supplement the project 
of Peijnenburg et al. that avoids the problems raised by the Carl case. The 
problem with successfully developing such a response, however, is that it is 
entirely unclear what sort of notion they could use, given their emphasis on 
calculability. To see this, consider the spectrum of possible views. On one end, 
the notion of availability drops out. This end of the spectrum has the unfor- 
tunate consequence that the view collapses into maintaining that a belief is 
justified for a person when there merely exists an infinite, non-repeating chain 
of reasons that makes the belief probable. ...On the other end of the spec- 
trum, one might hold a very strong notion of availability, according to which 
it is required that one actually believe a reason for it to be available. But this 
is far too strong, as it runs face-first into the original finite minds objection to 
infinitism. ...One lesson to draw from the Carl case is that moving a brand of 
infinitism beyond Klein’s middle ground on the notion of availability proves 
seriously problematic ....”? 


Here Podlaskowski and Smith write as if there are only two possibilities: 
either we merely calculate, and then no reason qua reason is available, or 


27 Thid., 216. 
28 Ibid., 217. 
29 Tid. 
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we hold on to a strong notion of availablity, but then we run into the finite 
mind objection. Our remarks in the previous section provide us with a no- 
tion of availability that avoids the two extremes that Podlaskowski and Smith 
present. Often it is enough that only a few reasons are available in order to 
draw conclusions that go far beyond what is implied by these available rea- 
sons themselves. If the reasons in question bring us close enough to the true 
value of the target, then the phenomenon of fading foundations tells us that 
we can ignore the rest of the chain. If, on the other hand, the reasons do not 
bring us within a desired level of accuracy, then we will have to achieve a 
balance between the number of reasons that we can handle and the degree to 
which we can approach the final value of the target. Thanks to the condition 
of probabilistic support we can determine how many reasons we need in or- 
der to conclude that the rest of the chain is irrelevant. In this chapter we have 
explained the idea in quantitative terms, but it can quite easily be grasped in 
an intuitive and qualitative way. 
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Chapter 6 
Conceptual Objections 


Abstract 

There are two conceptual objections to the idea of justification by an infinite 
regress. First, there is no ground from which the justification can originate. 
Second, if a regress could justify a proposition, another regress could be 
found to justify its negation. We show that both objections are pertinent to a 
regress of entailments, but fail for a probabilistic regress. However, the core 
notion of such a regress, i.e. probabilistic support, leaves something to be 
desired: it is not sufficient for justification, so something has to be added. A 
threshold condition? A closure requirement? Both? Furthermore, the notion 
is said to have inherent problems, involving symmetry and nontransitivity. 


6.1 The No Starting Point Objection 


In the previous chapter we discussed the main pragmatic argument against 
justification by infinite chains, known as the finite mind objection. Perhaps 
even more serious, however, are the conceptual objections. They aim to show 
that even creatures with an infinite lifespan or with a mind that can handle 
infinitely long or complex chains will run into problems, because the very 
idea of justification is at odds with a chain of infinite length: 


conceptual arguments ...appeal ...to the incompatibility of the concept of 
epistemic justification and infinite series of support.! 


Two conceptual objections in particular are often discussed. According to the 
first, no proposition can ever be justified by an infinite regress, since in such 


l Aikin 2011, 51. 
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a regress justification is for ever put off and never materialized. This is the 
much raised no starting point objection, as Peter Klein has called it, which 
is based on the fact that an infinite chain is bereft of a source or a foundation 
from which the justification could spring.” The second conceptual objection 
goes beyond the first one, spelling out what would happen if the no starting 
point objection did not apply. If, per impossibile, a particular proposition 
q were justified by an infinite chain, then it can be demonstrated that all 
propositions could be justified in that manner, including the negation of q. 
This objection is known as the reductio argument, and it has been raised 
in different forms, notably by John Pollock, Tim Oakley, James Cornman, 
Richard Foley, and John Post. 

In the present section and in the next one we discuss the no starting point 
objection. We shall argue that a starting point is not needed if the regress is 
probabilistic — a conclusion which follows from the preceding chapters. In 
Sections 6.3 and 6.4 we shall deal with the reductio argument, showing that 
this objection, too, fails for a probabilistic regress. In the final section, 6.6, 
we note that the concept which is central to a probabilistic regress, viz. prob- 
abilistic support, is itself prone to problems. We elaborate on two properties 
of probabilistic support that are allegedly problematic for the concept of jus- 
tification, namely that probabilistic support is symmetric and that it lacks 
transitivity. 

The no starting point objection asserts that justification can never be cre- 
ated by inferences alone. The reason is that an infinite inferential chain 
blocks ab initio the possibility of justification. The only way to generate 
justification is by having a starting point, i.e. a proposition or a belief that is 
itself non-inferentially justified. Aikin phrases the objection as follows: 


...if reasons go on to infinity, then as far as the series goes, there will always 
be a further belief necessary for all the preceding beliefs to be justified. If 
there is no end to the chain of beliefs, then there is no justification for that 
chain to inherit in the first place.* 


The no starting point objection exploits the fact that in an infinite regress 
justification seems to be indefinitely postponed and never cashed out. It is 
as if we are given a cheque with which we go to a bank teller, who gives 
us a new cheque and directs us to another bank teller, who hands us a third 


2 Klein 2000, 204. Cf Laurence Bonjour: “The result ... would be that justifica- 
tion could never get started and hence that no belief would be genuinely justified” 
(Bonjour 1976, 282). 

3 Pollock 1974, 29; Oakley 1976; Cornman 1977; Foley 1978; Post 1980. 

* Aikin 2011, 52. 
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cheque, instructing us to go to yet another bank teller, and so on and so 
forth. Never do we encounter a bank teller who actually converts our current 
cheque into bars of gold. 

Like the finite mind objection, this objection too has a long history, going 
back indeed to Aristotle. Aikin recalls some of the latest versions: 


William Alston captures the argument as follows: If there is a branch [of me- 
diately justified beliefs] with no terminus, that means that no matter how far 
we extend the branch the last element is still a belief that is mediately justified 
if at all. Thus, as far as this structure goes, whenever we stop adding elements 
we have still not shown that the relevant necessary condition for mediate jus- 
tification of the original belief is satisfied. Thus the structure does not exhibit 
the original belief as mediately justified [Alston 1986, 82]. 

Henry Johnstone captures the thought: ‘X infinitely postponed is not an X’ 
since the series of postponements shortly becomes ‘inane stammering’ [John- 
stone 1996, 96]. 

Romane Clark notes that such a series will produce only ‘conditional justi- 
fication’ [Clarke 1988, 373], and Timo Kajamies calls such support ‘incurably 
conditional’ [Kajamies 2009, 532]. 

The same kind of thought can be captured with an analogy. Take the one 
R.J. Hankinson uses in his commentary on Sextus: ‘Consider a train of infinite 
length, in which each carriage moves because the one in front of it moves. 
Even supposing that fact is an adequate explanation for the movement of each 
carriage, one is tempted to say, in the absence of a locomotive, that one still 
has no explanation for the motion as a whole. And that metaphor might aptly 
be transferred to the case of justification in general’ [Hankinson 1995, 189].° 


In the same vein, Carl Ginet writes: 


Inference cannot originate justification, it can only transfer it from premises 
to conclusion. And so it cannot be that, if there actually occurs justification, 
it is all inferential ... [T]here can be no justification to be transferred unless 
ultimately something else, something other than the inferential relation, does 
create justification.® 


Ginet cites Jonathan Dancy, who phrases the no starting point objection as 
follows: 


Justification by inference is conditional justification only; [when we justify A 
by inferring it from B and C] A’s justification is conditional upon the justifica- 
tion of B and C. But if all justification is conditional in this sense, then nothing 
can be shown to be actually, non-conditionally justified.’ 


5 Ibid., 53 — misspellings corrected. 
6 Ginet 2005, 148. 
7 Dancy 1985, 55. 
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The no starting point objection is also at the heart of Richard Fumerton’s 
“conceptual regress argument” against justificatory chains. On several occa- 
sions Fumerton has distinguished between two “regress arguments” in sup- 
port of foundationalism: the epistemic and the conceptual regress argument.® 
The first boils down to the finite mind objection against infinite chains. It 
states that “having a justified belief would entail having an infinite number 
of different justified beliefs” while in fact “finite minds cannot complete an 
infinite chain of reasoning”.? In the previous chapter we have explained why 
we think that this objection does not succeed. The conceptual regress argu- 
ment, on the other hand, appears to be a rewording of the no starting point 
objection. Fumerton calls it “quite different” from the epistemic regress argu- 
ment, and “more fundamental”.!° It states that an infinite justificatory chain 
is vicious because we can only understand the concept of inferential justifi- 
cation if we accept that of noninferential justification: 


[I]f we are building the principle of inferential justification into an analysis of 
the very concept of justification, we have a more fundamental vicious concep- 
tual regress to end. We need the concept of a noninferentially justified belief 
not only to end the epistemic regress but to provide a conceptual building 
block upon which we can understand all other sorts of justification. I would 
argue that the concept of noninferential justification is needed ...in order to 
understand other sorts of justification ... .'! 


In other words, the very idea of inferential justification does not make sense 
without assuming justification that is noninferential, or, as Fumerton formu- 
lates it later, the concept of inferential justification is “parasitic” on that of 
noninferential justification: 


To complete our analysis of justification we will need a base clause — we will 
need a condition sufficient for at least one sort of justification the understand- 
ing of which does not already presuppose our understanding the concept of 
justification. But that sort of justification is just what is meant by noninferen- 
tial justification (justification that is not inferential). Our concept of inferen- 
tial justification is parasitic upon our concept of noninferential justification. It 
doesn’t follow, of course, that anything falls under the concept. But if nothing 
does, then there is no inferential justification either ... 12 


8 Fumerton 1995, Chapter 3; Fumerton 2004; Fumerton and Hasan 2010; Fumerton 
2014. 

9 Fumerton 1995, 89; 2004, 150; 2006, 40; 2014, 76. 

10 Fumerton 1995, 89; 2014, 76. 

11 Fumerton 1995, 89. 

12 Fumerton 2014, 76. 
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Not surprisingly, Fumerton’s response to his conceptual regress argument 
echos the standard reply to the no starting point objection: the only way to 
inject justification into an inferential chain is to assume a source from which 
the justification springs. Without such a source, the very concept of inferen- 
tial justification becomes unintelligible or even absurd, ‘inane stammering’ 
as Henry Johnstone would have it. 

A particularly interesting and generalized version of the no starting ob- 
jection has been put forward by Carl Gillett.!* The problem with an infinite 
chain of reasons, Gillett says, does not lie in its epistemological character as 
such, but is more general: it has to do with its general metaphysical struc- 
ture, which it shares with many vicious regresses outside epistemology. This 
structure is such that the relevant dependent property (which in the episte- 
mological case is ‘being justified’) cannot be produced, because there is a re- 
lation of dependence, what Gillett calls the ‘in virtue of’ relation. If a propo- 
sition q is justified in virtue of A; being justified, which in turn is justified 
in virtue of Az being justified, then it is notoriously unclear how any of the 
propositions could be justified. Making the chain longer is of course no solu- 
tion, for irrespective of the number of propositions we add, each proposition 
will only be justified because of another proposition. Thus, Gillett concludes, 
there is no number of propositions that can be added “that will suffice for 
any of its dependent properties to feed back to any members of the chain’”.!4 
According to the ‘Structural Objection’, as Gillett has dubbed his particular 
version of the argument, the very structure of the epistemic regress prevents 
justification from arising. 

In none of these different formulations of the no starting point objection is 
it made clear what exactly is meant by epistemic justification. When for ex- 
ample Dancy complains that, “if all justification is conditional ... then noth- 
ing can be shown to be actually, non-conditionally justified”, it is not clear 
what he means by ‘conditional’ and ‘non-conditional’, since it remains open 
whether he sees justification as for example entailment or as involving prob- 
abilistic support (see Chapter 2). In the first case, his talk about conditional 
and non-conditional justification would refer to the difference between if- 
then statements and categorical statements; in the second case, it pertains to 
the difference between conditional and unconditional probability statements. 
The distinction is however vital in a discussion of the no starting point objec- 
tion. For while the objection applies to justification as entailment, as applied 
to justification as probabilistic support it backfires completely. This result 


13 Gillett 2003. 
14 Thid., 713. 
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was already intimated in the previous chapter, but we will explain it further 
in the next section. 


6.2 A Probabilistic Regress Needs No Starting Point 


It is not difficult to see why the no starting point objection applies if justifi- 
cation is interpreted as a kind of entailment. Consider the finite chain 


Ao <— A, +— A2 +— A3 +— .... $— Am $— Anti (6.1) 


where the arrow represents entailment, where Ap does duty for the target, 
q, and where A„+ı stands for the foundation or ground. Then of course the 
only way to know for sure if Ag is true is by knowing that Am+1 is true. In 
the words of Aikin: “Conceptual arguments start from the deep, and I think 
right, intuition that epistemic justification should be pursuant of the truth”.!> 
But if we are ignorant of the truth or falsity of the ground, Am+1, we are 
groping in the dark about the truth value of Ap. When we make chain (6.1) 
infinite, so that it looks like: 


Ao < A] 4 A2 ¢ A3 4 Ag 4 ate (6.2) 


then the matter is worse: since there is no initiating Am+1, there is no truth 
value that is preserved in the first place. For the only way in which the tar- 
get can be justified is by receiving the property from its neighbour, which 
received it from its neighbour, and so on. If there is no origin from which the 
property is handed down, there is nothing to receive, so the no starting point 
objection applies in full force. 

Things are very different when justification is interpreted probabilistically. 
Applied to a probabilistic chain, the no starting point objection means that 
the target can only be justified by a chain of conditional probabilities if we 
know the unconditional probability of the ground. That is, in order to know 
P(Ao), we need to know not only all the P(Aj|Aj,1) and P(A;|A;+1), but 
also the unconditional probability P(A,,+1). But if the chain is infinitely long, 
there is no Am+1, and thus there is no probability of Am+1 that can be known 
in the first place. As a result, the no starting objection concludes, there is no 
way to know the value of P(Ao). 

In the previous chapters we have seen why this conclusion does not fol- 
low. In all but the exceptional cases, the value of P(Ag) can be determined 


15 Aikin 2011, 51. 
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without having to know the value of some P(A„+1). In fact, as we saw in 
Chapter 5, in many cases we do not even need to know the values of all the 
conditional probabilities; once we have fixed a particular level of accuracy 
with which we are satisfied, we can decide how many conditional proba- 
bilities we need to know in order to attain that accuracy. If the number of 
conditional probabilities turns out to be too big to handle, then we must ad- 
just the accuracy level and make do with an approximation of the target’s 
true value with an error margin that is somewhat bigger than we had initially 
envisaged. So while the no starting point objection implies that in an infinite 
regress the value of P(Ao) either goes to zero or remains unknown, neither 
of these two options actually obtains when the probabilistic regress is in the 
usual class. 

John Pollock has trenchantly criticized what he calls “the nebula theory” 
of justification: never can an infinite chain justify a target, since the chain’s 
ground is for all future time hidden in “a nebula”.'6 Pollock would be right 
that this is an insuperable problem so long as we are speaking about a regress 
of entailments; but in a probabilistic regress the difficulty does not arise at 
all. For all we care A may forever lie hidden in nebulae, in a probabilistic 
regress that does not matter since A» is completely irrelevant to the question 
whether Ag is probabilistically justified or not. 

Rather than talk about a nebula, we could also use the metaphor of a 
borehole. Compare the justification of a target by an epistemic chain to the 
pumping up of water from a deep well. If the chain is non-probabilistic, then 
the relations of entailment serve as neutral conduits through which justifica- 
tion passes unhindered. The justification itself comes from the bottom of the 
borehole, whence it is pumped up and transferred along the chain, whither it 
streams to the target proposition. If the epistemic chain is infinite, there is no 
beginning, the borehole is bottomless, the pumping stations forever remain 
dry, and no justification will ever gush out to the target. But now imagine that 
the infinite chain is probabilistic. Then a bottom is not needed. For now jus- 
tification does not surge up unchanged from source to target; rather it comes 
from the conditional probabilities, which jointly work to confer upon the tar- 
get proposition an acceptable probability. The conditional probabilities are, 
as it were, the intermediate pumping stations which actively take a moeity 
of justification from the circumambient earth, rather than passively wait for 
what comes up through the borehole. In a probabilistic regress we deliver 
justification, albeit piecemeal, whereas in a non-probabilistic regress we are 
not able to produce anything at all. In the latter case there is nothing more 


16 Pollock 1974, 26-31. 
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than the pointing to a fathomless borehole, or to a bank teller beyond the end 
of the universe who is supposed to administer my fortune. 

Yet another metaphor was suggested to us by an anonymous reviewer; it 
concerns the saga of the bucket brigade. Suppose there is a fire and Abby 
gets her water from Boris, and Boris gets it from Chris, and Chris from Dan, 
and so on ad infinitum. It would seem that the fire will never be put out, 
since there is no first member of the brigade who actually dips his or her 
bucket into the lake. However, once we assume that justification involves 
probabilistic support the dousing operation looks quite different. Under this 
assumption, the proposition ‘Abby gets water from Boris’ (Ao) is only prob- 
abilistically justified, and we can calculate the probability value of Ag by 
applying the rule of total probability that we cited earlier: 


P(Ao) = P(AolA1)P(A1) + P(AolAı)PCAı), (6.3) 


where A, reads ‘Boris gets water from Chris’. Of course, whether Boris gets 
water is also merely probable, and its probability depends on whether Chris 
gets water, and so on. We face here an infinite series of probability values 
calculated via the rule of total probability. As we know by now, we are per- 
fectly able to compute the outcome of this infinite series in a finite time: with 
the numbers that we used in the uniform case of the bacterium example in 
Section 3.7, the probability that Abby gets water is 3. 
All four probabilities on the right-hand side of (6.3), the conditional as 
well as the unconditional ones, are supposed to have values strictly between 
zero and one (in the interesting cases). In contrast, the regress of entailments, 
in which justification is not probabilistic, can be modelled by restricting all 
four ‘probabilities’ to be 0 or 1. Within this non-probabilistic approach, Abby 
either gets water or she does not. According to the no starting point objec- 
tion, the moral of the saga about the bucket brigade is precisely that she does 
not get water — if the number of brigadiers is infinite. Because this is unac- 
ceptable, it is concluded that there must be a first firefighter on the shore of 
the lake who starts off the whole operation. In the probabilistic scenario the 
existence of a primordial firefighter is not needed, since the problem that it is 
supposed to solve does not arise in the first place. The reason is, as we have 
seen, that now the relations between the propositions are not idle channels, 
but actively contribute to the probability value of Ao; they for example allow 
for a downpour somewhere along the line that fills the bucket. So if we take 
seriously that justification involves probabilistic support, then the probability 
that Abby extinguishes the fire can have a precise value, despite the infinite 
number of her team-mates. As in the examples that we considered above, 
this unconditional value is a function of all the conditional probabilities. 
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Note that the above reasoning is independent of whether we embrace an 
objective interpretation of probability (assuming, for example, that the fire- 
fighters have propensities for handing over the water only now and then) 
or a subjective interpretation (in which we specify our degree of belief in 
Ao). Both the objective and the subjective interpretation are bound by the 
rule of total probability, and that is all that counts here. This suggests that 
our approach is not restricted to epistemological series, but might be applied 
more generally to the metaphysical structures that Carl Gillet has been talk- 
ing about. In fact, it might even be used to query similar reasonings in ethics. 
Richard Fumerton argued that his conceptual regress argument for founda- 
tionalism has a counterpart in the ethical realm. Suppose we are interested in 
whether an action, X, is good, and suppose we are being offered a series of 
conditional claims: if Y is good then X is good, if Z is good then Y is good, 
and so on, ad infinitum. Have we answered the original question? Fumerton 
believes we have not. At best we possess an infinite number of conditional 
claims, but this does not tell us whether X is good. Just as inferential justifica- 
tion only makes sense if there exists noninferential justification, instrumental 
goodness only makes sense if we assume that some things are intrinsically 
good: 


... the view that there is only instrumental goodness is literally unintelligible. 
To think that something X is good if all goodness is instrumental is that X 
leads to a Y that is good by virtue of leading to a Z that is good, by virtue 
of ..., and so on ad infinitum. But this is a vicious conceptual regress. The 
thought that X is good, on the view that all goodness is instrumental, is a 
thought that one could not in principle complete. The thought that a belief 
is justified, on the view that all justification is inferential, is similarly, the 
foundationalist might argue, a thought that one could never complete. 

Just as one terminates a conceptual regress involving goodness with the 
concept of something being intrinsically good, so one terminates a conceptual 
regress involving justification with the concept of a noninferentially justified 
belief.!7 


The concept of intrinsic goodness stands to the concept of instrumental good- 
ness as the concept of noninferential justification stands to the concept of 
inferential justification. Just as there are no good things without there being 
something that is intrinsically good, so also there are no inferentialy justified 
beliefs unless there are noninferentially justified beliefs. 18 


Fumerton would be right that instrumental goodness implies intrinsic good- 
ness if the conditional claims are of the form ‘if Y is good then X is good’. 


17 Fumerton 1995, 90. 
18 Fumerton 2014, 76. 
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For then goodness is transferred lock, stock and barrel along the chain, and 
the no starting point objection, or rather Gillet’s more general Structural Ob- 
jection, applies in full force. However, we have been arguing that the sit- 
uation changes radically if the claims take on the form ‘if Y is good then 
there is a certain probability that X is good’ and ‘if Y is bad then there is a 
certain (lower) probability that X is good’, and so on. For now goodness is 
not transferred in its entirety along the series. Rather it slowly emerges as 
we progress from the links Z to Y and Y to X. In this probabilistic scenario 
the original question would be how probable it is that a certain action, X, is 
good. And this question can indeed be answered; as we have seen, with the 
numbers chosen, it is Z. 


6.3 The Reductio Argument 


According to the reductio argument, if an infinite chain could justify a target 
Ao, then another infinite chain could be constructed that would justify the 
target’s negation, —Ag. Since it does not make sense for a proposition and its 
negation both to be justified, the proponents of this argument conclude that 
justification by an infinite chain is absurd. 

Like the no starting point objection, the reductio argument has taken on 
different formulations. Here we will concentrate on a version that was of- 
fered by John Post in a tightly argued paper, which is in fact an improved 
version of arguments that had been put forward by John Pollock and James 
Cornman.!? 

Post starts his argument by defining an infinite justificational regress as a 
“non-circular, justification-saturated regress”, by which he means that “every 
statement in the regress is justified by an earlier statement, and none is jus- 
tified by any set of later statements”.?? As we have seen in Chapter 2, Post 
sees the justification relation as entailment, or better, ‘proper entailment’: 
“if anything counts as an inferential justification relation, proper entailment 
does ...If A, properly entails A,_;, then A,_; is justified’! Now consider 
again the infinite chain 


Ago <— A, <— Aa — A3 +— Ag +... (6.4) 


19 Post 1980; Pollock 1974, 28-29; Cornman 1977. 
20 Post 1980, 3. 
21 Ibid. Post has X and Y where we write A, and A„_1. 
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where it is assumed that the propositions are connected by proper entailment 
relations in the sense of Post, and where again Ag does duty for the tar- 
get q. According to Post, chain (6.4) is a non-circular, justification-saturated 
regress if and only if the following three conditions are satisfied: 


a. A, entails A,_; (n > 0); 
b. An is not entailed by any Amen; 
c. A„ is not justified on the basis of any set of Am<n. 


The first condition captures the idea that justification is a relation of entail- 
ment. The second condition is meant to ensure non-circularity. The third 
condition is added in order to block the possibility that a set of propositions 
might in some way or other together conspire to justify a proposition higher 
in the chain, which would make the regress circular after all. In the following 
we will always assume non-circularity in the background. 

The construction of (6.4) as a non-circular, justification-saturated regress 
presupposes that at every step of the regress there indeed exists some propo- 
sition, A„, which satisfies conditions a, b and c. Are there any examples of 
(6.4) that do the job? According to Post there are many, since there are many 
forms of proper entailment which meet the three conditions above. One of 
them is obtained by using modus ponens to interpret the links in the chain as 
follows: 


Ao = Bo 

Aı = B; A (By — Bo) 

A2 = BoA (B2 => (By N (Bı => Bo))) 

A3 = B3 A (B3 — (B2 A (B2 > (B1 A (Bi — Bo))))), (6.5) 
and by adding the restriction that Bı is some proposition not entailed by 
Ao, that Bz is some proposition not entailed by A4, and so on. Under these 
restrictions it is the case that A; entails Ag, Aa entails A1, and so on; but Ag 
does not entail A4, A; does not entail A2, and so on. Moreover, there is no set 
of propositions that together justify a proposition higher in the chain, so the 
conditions a, band c are fulfilled. 


Since B A (B — A) is formally equivalent to B ^A, (6.5) can also be written 
as 


Ao = Bo 
A, = B1 ABo 

A2 = B2 A B1 ABo 

A3 = B5 A B2 A B1 A Bo, (6.6) 
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and so on, so that the chain (6.4) amounts to 
Bot (Bı A Bo) — (B2 ABı A Bo) — (B3 AB2ABı A^ Bo) = kale (6.7) 


Each link in (6.7) justifies its neighbour to the left, with the exception of Bo, 
which has no left-hand neighbour.”? 

Does it make sense to say that (6.7) justifies Ag? Post rightly claims that it 
does not. For in this manner a regress of propositions can be constructed for 
any target proposition, in particular for the negation of Ag. We only need to 
construct the infinite chain: 


Ap HA, HAH AGL... (6.8) 


where the A’, are interpreted as 


o = Bo 
A) = BY A(B > Bo) 
Ay = By A (B3 + (By A (Bi > ~Bo))) 
A3 = B3 A (B3 > (B3 A (B3 > (B1 ^ (B1 > Bo)))))- (6:9) 


Chain (6.8) reduces to 


Bo + (Bi ^~Bo) + (B4 A B1 N-Bo) + (B3 AB AB) NHBo) +... (6.10) 


So if an infinite regress could justify a target proposition Ao, then another 
could justify —Ao, which is of course absurd. Hence the reductio argument, 
which shows that an infinite regress of proper entailments cannot justify a 
proposition. 

Both Peter Klein and Scott Aikin made an attempt to ward off the reduc- 
tio. Klein’s idea is that an infinite chain of proper entailments as set up by 
Post is necessary, but not sufficient for the justification of a target: in order 
to be sufficient, the propositions in the chain should also be “available” as 
reasons.”° Aikin has argued that the only way to repel the reductio argument 
is by taking a mixed view: infinitism and foundationalism do not exclude 
one another, for a proposition can be both inferentially and noninferentially 


22 Eq.(6.6) is used in Oakley’s second argument against justification by infinite 
regress (Oakley 1976, 227-228). Aikin calls (6.6) “the simplification reductio” 
(Aikin 2011, 58.) 

23 Klein 1999, 312; Klein 2003, 722. 
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justified.’ Aikin here takes up an idea by Jay Harker, namely that not all 
regresses of entailment make sense as justificatory chains, but that some do. 
According to Harker, a regress merely of beliefs is insufficient; a justifica- 
tory chain must contain relations to facts as well, although it may still be 
infinite.” 

Thus Klein, Aikin and Harker all endorse the intuition that more is needed 
for justification than an infinite, unanchored chain of proper entailments; 
something has to be added to this chain in order to make it a justificatory 
chain. We fully share this intuition, but we think that a chain of entailments 
does not lend itself so easily to such an add-on — it is somehow too self- 
contained for that. What helps to prevent the reductio is to abandon the idea 
that the links in the chain are connected via proper entailment and to adopt 
connections through probabilistic support. Holding on to the assumption of 
entailment means strenghtening the reductio argument; the argument is better 
combated by assuming regresses to be probabilistic, as we will explain in the 
following section. 


6.4 How the Probabilistic Regress Avoids the Reductio 


In a standard finite chain such as (6.1), where the arrow represents entail- 
ment, the ground A„ +1 is all-important: the truth value of the target Ag is 
a function of the truth value of A,,;; and of nothing else. The story is basi- 
cally the same in the infinite case. However, there is then no ground, which is 
precisely the reason why it does not make sense to say that the target is jus- 
tified. The concept of entailment is the culprit here, for it forces us to accept 
two things that are hard to combine, namely that the ground is all-important 
and non-existent at the same time. Exactly this combination precipitates the 
reductio argument. Nothing now restricts us in gratuituously constructing a 
rivalling regress that ‘justifies’ the target’s negation, since the only restric- 
tion that matters, to wit the truth value of the ground, is conspicuous by its 
very absence. 

The situation is entirely different in an infinite probabilistic regress. True, 
there too a ground is lacking. But this is irrelevant, for the probability value 
of the target is a function of the conditional probabilities alone. So it all de- 


24 Aikin 2011, 59-60 and Chapter 3. 

25 Harker 1984. Selim Berker takes a comparable route, offering the infinitist a way 
to avoid a fundamentalist regression stopper without running the gauntlet of the 
reductio argument — in Section 8.6 we will briefly come back to Berker. 
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pends on the question: What, in a justificatory chain, determines the value of 
the target? In a standard chain of entailments, the truth value of the target is 
determined by that of the ground, independently of the length of the chain. 
In a probabilistic chain, however, the length of the chain is relevant. If the 
probabilistic chain is finite, then the target’s probability value is a function of 
both the unconditional probability of the ground and the conditional proba- 
bilities. As the chain gets longer, the influence of the ground decreases while 
the influence of the combined conditional probabilities increases. In the limit 
that the chain goes to infinity, only the conditional probabilities matter, and 
the röle of the ground has died out (in the usual class). In this regard the 
difference between a non-probabilistic and a probabilistic regress could not 
be greater: in the former, the only variable that counts is a function of the 
ground, whereas in the latter the ground is of no significance whatsoever.”® 

We may conclude that the reductio argument misfires when the regress 
is a probabilistic one. The argument hinges on the assumption that the only 
variable which is responsible for the truth value of the target, namely the 
truth value of the ground, is non-existent. This absence of a ground allows us 
to concoct as many free-floating regresses as we wish, since the only variable 
that would determine the truth value of the target, viz. the truth value of the 
ground, is forever postponed and never actualized. In a probabilistic regress, 
on the other hand, the non-existent ground is not pertinent to the probability 
value of the target. 

However, one could argue that this is too easy. For is it not possible to con- 
struct a rivalling probabilistic regress, i.e. a regress that supports the negation 
of our target? The only thing we would have to do is to come up with a set 
of conditional probabilities that numerically, and thus purely formally, be- 
stow upon the target a probability value that for example exceeds the chosen 
threshold. If these conditional probabilities are not in any way connected to 
the world, we can cook them up ad libitum. We could then well end up with 
two rivalling probabilistic regresses, one probabilistically justifying Ao, and 
the other one probabilistically justifying —Ao. 

Although the above argument is formally valid, it is not applicable to the 
issue that we are talking about. For it only works if the conditional proba- 
bilities are regarded as free variables, whose values may be chosen at will. 
We are however interested in epistemic justification, i.e. in the justification 
of propositions about our knowledge of how the world actually is, and this 
means that the conditional probabilities are not freely chosen. On the con- 
trary, as we explained in Section 4.4, in a probabilistic regress the conditional 


26 See also Peijnenburg and Atkinson 201 4a. 
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probabilities carry all the empirical thrust. Once we admit empirically deter- 
mined conditional probabilities, we are not free to invent other conditional 
probabilities in a competing regress for the negation of our target proposi- 
tion: the conditional probabilities are determined too, and they yield a prob- 
ability for the negation of the target that is one minus the probability of the 
target. If the target probability clears a threshold of acceptance greater than 
one half, the probability of the negation of the target will not do so. 

Our opponent might not be satisfied, and complain that it remains unclear 
how conditional probabilities can carry empirical information; after all, the 
interface between our propositions and the world is fraught with difficulty. To 
this we would reply that, of course, such difficulties exist, and they are well 
documented; the problem of finding a transducer between our propositions 
and the world cuts deep and might even turn out to be insoluble. But as 
we made clear in Section 4.4, our aim is not to say something about that 
problem: we are not trying to formulate an answer to the sceptic. Rather our 
aim is to draw attention to probabilistic regresses and to phenomena such 
as those of fading foundations and of the emergence of justification, and to 
point out that these phenomena have consequences for the age-old objections 
to infinite regresses. 

Andrew Cling has argued that an infinite regress can only justify a propo- 
sition if a certain condition is satisfied, notably that the regress is not “pure 
fiction” but has “grounding in how things are, are likely to be, or are reason- 
ably believed to be”.?” The trouble with infinitism, says Cling, is that this 
condition can only be satisfied if simultaneously the very idea of justifica- 
tion by an infinite regress is undermined. Our analysis indicates that Cling is 
correct if the justificatory regress is a regress of entailments, not if it is prob- 
abilistic. For a probabilistic regress, as we have seen, can probabilistically 
justify a proposition while still having entry points for the world in the form 
of the conditional probabilities. 

We have provisionally argued that these conditional probabilities arise 
from experiments, but of course they are not indubitable, and they can be 
questioned in turn. In that case they become the targets of new probabilistic 
chains. As we will explain in Section 8.5, this takes us from one-dimensional 
chains to multi-dimensional networks, where the effect of fading foundations 
still obtains.” 


27 Cling 2004, 111; see also Moser 1985, who makes a point similar to that of Cling. 
28 William Roche doubts whether a probabilistic regress can take away Cling’s 
worry (Roche 2016). We think that it can indeed, for the reasons explained here 
and in Sections 4.4 and 8.5. 
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It would be foolhardy to claim that probabilistic support along a chain of 
propositions or beliefs is sufficient for their justification. An obvious ob- 
jection to such a claim would be that, after all the contributions from the 
conditional probabilities have been summed, the resulting probability of the 
target might turn out to be less than a half, which means that, relative to this 
particular chain, the target would be more likely false than true. Under these 
circumstances one would not say that the chain justifies the target. Indeed, 
as we have stressed, something must be added to probabilistic support to 
achieve a sufficient condition for justification. 

Although it is certainly not our ambition to answer the difficult question of 
sufficiency, we shall in this section discuss two additional candidate desider- 
ata for justification. The first is simply a threshold constraint on the target 
probability; the second is a modified threshold requirement for a measure 
of justification that has been proposed by Tomoji Shogenji. We first look 
at the simple threshold constraint, using the tables in Chapter 4 as illustra- 
tion. We recall the well-known fact that this constraint falls foul of the in- 
tuition that justification should be closed under conjunction. But should un- 
restricted closure be a desideratum for justification? We argue that it should 
not: closure should be required only for independent propositions. The sim- 
ple threshold constraint does not respect this modified closure requirement, 
and so it should be rejected. Shogenji’s threshold condition, however, does 
respect this modified closure requirement. What makes Shogenji’s condition 
especially interesting for us, moreover, is that it sails between entailment 
and probabilistic support: it is stronger than mere probabilistic support, but 
weaker than entailment. It is therefore a refined desideratum for justification; 
but we are not so incautious as to claim that it is a sufficient condition. 

The simple threshold constraint amounts to the introduction of a context- 
dependent threshold of acceptance, say t, that is greater than one-half, but 
less than one.”? As a first attempt, we might propose that if q is justified to 
degree t by a single proposition, or by a finite or infinite chain of propo- 
sitions, then there must be probabilistic support along the chain, and P(q) 
must be not less than t. Here is an example. Suppose that we take t = 2 and 
refer to the tables in Chapter 4. We see from Table 4.1 that P(q) does not 
clear ? with a chain of ten or fewer intermediate A’s, but that it does so with 


4 
a chain of twenty-five or more intermediate A’s. 


29 Carnap 1980, 43, 70, 107; Fitelson 2013. 
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For a second example, look at Table 4.2, and again let r = 3 Now we 
see that P(g) clears the threshold in all cases, even when there is only one 
intermediate A. The reason for this is simply that the ground p has a high 
probability; and in connection with the chosen values of the conditional prob- 
abilities œ and B (0.99 and 0.04) this means that P(g) already exceeds the 
threshold of 3 after one step. Had & and ß both been small, then the situation 
would have been very different; for then no number of steps would have been 
enough to reach the threshold, no matter how large the probability of p was. 
It can also happen that the value of P(q) is larger than the threshold after a 
few steps, but sinks below the threshold if the chain gets longer. This can be 
illustrated by appealing to Table 4.2 again, and adopting the more demand- 
ing threshold of t = 0.85 instead of 0.75. With ten or fewer steps this more 
stringent threshold is exceeded, but with twenty-five or more steps we see 
that P(q) has sunk below the new threshold. In such a case q might appear 
to be justified (to degree 0.85), but later, as the chain lengthens, we discover 
that this is not so. 

Now consider still another example. Let the conditional probabilities both 
be very large, for example 0.99 and 0.96. Here again the target proposition, q, 
will have a probability well in excess of the threshold of 3, even when there 
is only one intermediate A. And this is so irrespective of what the probability 
of p might be. Here the joint conditional probabilities are already doing all 
the work. On the other hand, if both conditional probabilities are very small, 
then the probability of q will be very small, again irrespective of P(p). This 
is because the rule of total probability shows that P(q) is an interpolation 
between the two conditional probabilities, P(q|A1) and P(q|7A1). In such a 
case the target could not be justified by the regress. 

What these examples show is that the conditional probabilities, together 
with the unconditional probability of p, determine how long it takes before 
P(q) reaches the threshold, if indeed it does so. Sometimes the uncondi- 
tional probability of p has considerable influence, sometimes its influence is 
smaller: it is all contingent on the particular values. In the case of an infi- 
nite regress in the usual class, if the probability clears the threshold, this is 
achieved by the infinite set of conditional probabilities alone, without any 
contribution from p. 

However, requiring that justification implies that the target probability 
meet a threshold of acceptance runs into difficulties, as we have intimated. 
For if target propositions q and q’ are each supported by A|, and if each meets 
some threshold, 7, which is strictly less than one, it does not follow that the 
probability of the conjunction of q and q’ meets t. Should we require that, if 
propositions q and q’ are each separately justified by the same evidence A|, 
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then the proposition ‘q and q” is justified by the same evidence A|? That is, 
should we require that justification is closed under conjunction? To see that 
an unqualified ‘yes’ would be too quick an answer, let us look at a simple 
example. Suppose that a fair die is tossed, but not yet inspected. Let q be the 
proposition ‘the die shows 5’, and q’ be the proposition ‘the die shows 6’. Let 
A; be the proposition ‘the die shows more than 4’. Then P(q) = P(q’) = , 
and P(q|A;) = P(q'|A1) = 3, so both q and g’ are probabilistically supported 
by Aı. However, q and ’ are incompatible with one another, so P(q ^q’) =0; 
and of course we would not want to claim that A, justifies the impossibil- 
ity q Aq’. The conclusion is that we should not allow unlimited closure of 
justification under conjunction. This is of course the lesson that many peo- 
ple have drawn from the lottery paradox and similar quandaries concerning 
unrestricted closure of justification under conjunction. If one is justified in 
believing that ticket t; in a fair lottery will lose, and that ticket t; will also 
lose, is one justified in believing to the same extent that both ¢; and t; will 
lose? Evidently not, for the two failures to win are not independent of one 
another: if t; loses, the chance that t; will lose is reduced. 

If unrestricted closure is forbidden, what would be a reasonable require- 
ment concerning closure? Look at another example: suppose now that two 
coloured dice are tossed, but not yet inspected. Let g be the proposition 
‘the red die shows 5’, and let q’ be the proposition ‘the blue die shows 
6’, and let A; be the proposition “each die shows more than 4’. Once more 
P(q) = P(q') = 4, and P(q|A1) = P(q'|A1) = 5, so again both q and g’ are 
probabilistically supported by A, to the same degree. Now q and q’ are com- 
patible, moreover they are independent of one another, both unconditionally 
and conditionally: 


P(q^q') =P(qP()=% 
P(q^q'|A1) = P(qlAı)P(Q|Aı)=4- 


Again A, supports q and q’ probabilistically, but it also supports the conjunc- 
tion, gq’, for P(qgAq'|A1) > P(q/Aq’). Note that the degree of probabilistic 
support that A, gives to the conjunction q A q’ is not the same as the degree 
of support it gives to the conjuncts. However, if A justifies q ^ q', then it is 
reasonable to require that A, justifies the conjunction to the same degree as it 
justifies the conjuncts. After all, if one is justified (to some extent) in expect- 
ing the red die to show 5, and also in expecting the blue die to show 6, on the 
basis of knowledge that each of the dice shows either 5 or 6, then one should 
be justified, to the same extent, in expecting that the red die shows 5 and the 
blue die shows 6, on the same knowledge basis. That the red die shows 5 
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does not influence whether the blue die shows 6. Evidently the requirement 
that the probability clear a threshold of acceptance is not an adequate crite- 
rion; and it must be rejected as a desideratum for justification. The problem 
now is to find a measure of justification that clears a threshold and respects 
the findings of the above dice scenarios, and others like it. 

Tomoji Shogenji has constructed just such a measure of justification.” 
Suppose that q and q’ are independent, both unconditionally and also when 
conditioned by A1. Suppose further that both q and q’ have measures of justi- 
fication greater than some threshold of acceptance, s. Then Shogenji requires 
that their conjunction q ^q’ also has a measure of justification greater than 
s. Thus his measure J(g,Aı), the justification that A; bestows on q, respects 
closure in the restricted sense. 

Measure J(q,Aı) is a function of the various probabilities associated with 
q and Aı. But which function should it be? There are three independent 
candidates for the arguments of the function, for example P(g), P(Aı) and 
O = P(q|A1). Shogenji’s first step is to strike out P(A;), on the grounds that, 
if one were to conjoin to A; some independent and irrelevant proposition, /, 
the justification that A; A/ gives to q should be the same as that given by 
Aı. But P(A; AI) = P(Aı)P(I), and so the degree of justification would be 
changed by the conjunction if the measure were to depend on P(A). So the 
required measure of justification must be a function, f, of P(g) and P(q|A1) 
alone:?! 


0 


This immediately rules out the confirmation measure 
S(q,A1) = P(qlAı) — P(q|>A1), 
as a candidate for a measure of justification, since that may be rewritten as 


P(qlAı) — P(q) 


S(q,A1) = 1—P(A;) ) 


which is manifestly a function of P(A), as well as P(g) and P(qlAı).” 
Evidently the standard measure of confirmation, D, 


30 Shogenji 2012. 
3! Note that P(q|A, AT)= P(q|A1), if Z is independent of A, and of g/\ Aj. 
32 §(q,A1) is of course the same as 7. 
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does satisfy Shogenji’s first desideratum for J. As we remarked in Chapter 2, 
Carnap called this an “increase in firmness”, the extent to which the probabil- 
ity of q is increased by conditioning it on Aj. Shogenji requires that J(q,A1) 
should increase if P(g|A1) increases while P(q) is held fixed, and decrease 
if P(q) increases while P(g|Aı) is held fixed. It is clear that the measure D 
does these things. 

Could D be the required measure of justification, J? Not so, as we can see 
from the example of the coloured dice, since 


D(q,A1) = D(q',A1) = 5 = 
D(q\q',Ai) = 4-36 = 5: 
which are different, whereas the degree of justification of the conjunction of 
the independent propositions q and q’ should be the same as that for q and q’ 
separately. But not only does D not satisfy this closure requirement, none of 
the many other measures of confirmation do so either!*? 
Shogenji shows that the following new measure does satisfy the require- 
ment of closure: 


_ logP(glAı) 
log P(q) 


Although this is not the only function that satisfies Shogenji’s desiderata for 
a measure of justification, it has been proved that all functions that do so are 
ordinally equivalent to Shogenji’s J function.”* That is to say, if A; gives 
a higher degree of justification to one proposition than it does to another, 
according to the measure (6.11), then this ordering of justificatory degrees 
will be the same for any other measure that satisfies Shogenji’s conditions. 
We may say that the measure (6.11) is the unique solution of the problem, 
up to ordinal equivalence. A proof of the above is given in Appendix B; but 
here we shall simply check that the Shogenji measure works properly for our 
coloured dice. From (6.11) we calculate 


J(q,A1) =1 (6.11) 


log} 
J(q,A1) = J(q',A1) =1- l 2 
085 
log! 2log t log 
J(q4^q',A1)=1 +=1 =] 2. 
log 3g 2logz logg 


33 See Atkinson, Peijnenburg and Kuipers 2009 for a list of ten measures of confir- 
mation. A seminal paper on different measures of confirmation is Fitelson 1999. 
34 Atkinson 2012. 
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Thus J(q,Aı) =J(q’,A1) =J(qAq‘,A1), so if J(g,A1) > sand J(q’,A1) > s, 
for some s, it is trivially the case that J (q \q',A,) > s. In words, if q and q’ 
are Shogenji-justified to the same degree, their conjunction is also Shogenji- 
justified to that degree, as should be the case. 

If the degree of Shogenji justification that A; gives to q is not less than s, 
i.e. J(g,A1) > s, then 

| _ lesPG@IA1) = 
log P(q) 

and this can be recast in the form (see Appendix B) 


P(qlA1) > [Pia]. (6.12) 


Note that when s = 0 — so there is effectively no threshold — this inequality 
reduces to 


P(qlAı) > P(g) 


which is equivalent to our condition of probabilistic support (or neutrality, in 
the case of the equals sign). On the other hand, when the threshold is at its 
maximum, so that s = 1, the relation becomes 


P(q|A1) 21, which of course implies P(glAı)=1, 


since no probability can be greater than one. This is the probabilistic condi- 
tion that corresponds to entailment. 

For non-extremal values of the degree s, the measure J interpolates be- 
tween probabilistic support and entailment. Since entailment is too strong 
a requirement for a viable understanding of justification, and probabilistic 
support is too weak, it is very suggestive that this measure of Shogenji may 
be a step in the right direction in the search for the holy grail of a sufficient 
condition for justification. 


6.6 Symmetry and Nontransitivity 


In this chapter we have discussed the two conceptual objections to infinite 
epistemic chains that occur most frequently in the literature, the no start- 
ing point objection and the reductio argument, and we argued that they lose 
their bite when justification is seen as something that involves probabilistic 
support rather than entailment. Since probabilistic support is not enough for 
justification, we looked in the previous section at two candidates for add-ons. 
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One could however raise objections to the very concept of probabilistic 
support itself. It is after all the child of a theory that is beset by a number of 
serious pitfalls: the problem of old evidence, the problem of spurious rela- 
tions, of irrelevant conjunctions, of randomness, and more. 

Whenever a theory encounters problems, either we reject it because the 
problems are too serious, or we continue to use it, trying in the meantime to 
put things right. In the case of Kolmogorovian probability theory the choice 
seems clear. Aside from exotics such as quantum probability and Robin- 
sonian nonstandard analysis, Kolmogorov’s calculus is very much the only 
game in probability town. When in epistemology we say that one proposition 
‘probabilifies’ another, it would be wise to take Kolmogorov’s system seri- 
ously, at least until we have found a better interpretation of ‘probabilifies’. 

This book is not the place to dwell on all the snags and hitches of Kol- 
mogorovian probability. Yet there are two properties of the concept of proba- 
bilistic support that require some further consideration, since epistemologists 
may find them troublesome in the context of epistemic justification. The first 
is the fact that probabilistic support is not transitive and the second one is 
that it is symmetric. 

Many epistemologists have explicitly or implicitly expressed the view that 
epistemic justification is transitive: if A, is justified by A„+1 and A„+1 is jus- 
tified by Ay+2, then A, is justified by A„+2. Such a view is of course apposite 
if justification is perceived as entailment or implication, for then justification 
is transmitted unchanged from one proposition to another. But if justification 
is understood as involving probabilistic support, then transitivity may be vi- 
olated. It all depends on what must be added to the relation of probabilistic 
support to yield that of justification. For example, if justification were equiv- 
alent to probabilistic support plus the Markov condition, then justification 
would be transitive, since transitivity is a property of probabilistic support 
when the Markov restriction is in place. If however justification were equiv- 
alent to probabilistic support plus a threshold condition, then it would not be 
transitive. As we have made clear, we refrain from making any claims about 
what has to be added to probabilistic support in order to yield justification. 
The point to make here is just that probabilistic support as a necessary con- 
dition for justification entails nothing about the transitivity of justification. 

A similar argument applies to the required asymmetry of justification. 
When considered qualitatively, probabilistic support is symmetrical: if Ay+1 
supports A,, then A, supports A„+ı. However, from the fact that proba- 
bilistic support is (qualitatively) symmetric, it does not follow that justifi- 
cation is qualitatively symmetric as well. An argument parallel to the one 
just given about transitivity shows that the symmetry of probabilistic support 
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entails nothing about the symmetry of justification. In fact the example of the 
Markov condition fits the bill here, too. For if A„,+1ı supports A,, and A„+1 
screens off A, from all ‘ancestor’ propositions in the chain, i.e. Am where 
m>n-+1, then A, will in general not screen off A,+ı from all ‘descen- 
dent’ propositions, i.e. Am where m < n. Thus if justification were equivalent 
to probabilistic support plus the Markov condition, it would not be qualita- 
tively symmetric. As we stressed above, the Markov model is not meant to 
be taken as a serious candidate as to how justification should be defined: it 
merely shows that justification can be asymmetric, even though probabilis- 
tic support is symmetric. A formal demonstration of this fact is as follows. 
Consider these three statements: 


(1) if An+ı justifies A,, then A,; probabilistically supports A, 

(2) if An+1 probabilistically supports An, then A, probabilistically sup- 
ports Anti 

(3) if A,+1 justifies A„, then A, justifies A„+1. 


The point is that (3) does not follow from (1) and (2). What does follow 
from the latter two statements is: 


(3’) if An+ı justifies A,, then A,,,, probabilistically supports 
An and A, probabilistically supports Ay+1. 


The consequent of (3’) expresses the fact that probabilistic support is sym- 
metric. But this does not mean that justification is symmetric; it does not 
follow from this that A, justifies An+1. 

It is important to note that the matter is quite different with respect to 
fading foundations. The effect of fading foundations is not a property like 
transitivity or symmetry. As a result, it does follow that justification implies 
the existence of fading foundations (within the usual class). In detail: 


(1”) if An+ı justifies An, then A„+1ı probabilistically supports A, 

(2”) if An+ı probabilistically supports A,, and the conditional probabilities 
belong to the usual class, then fading foundations ensue 

(3”) if An41 justifies A,, and the conditional probabilities belong to the 
usual class, then fading foundations ensue. 


In this case (3”) does follow from (1”) and (2”). Irrespective of whether 
we are talking about probabilistic support or about epistemic justification, 
the phenomenon of fading foundations is the same, the reason being that the 
latter does not have a meaning independent of probability theory, which we 
take to be necessary for justification: if there is no probabilistic support, then 
there is no justification. The properties of transitivity and symmetry, on the 
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other hand, do not need to refer to probability theory in order to have the 
meanings that they have. Thus under justification the influence of the proba- 
bility of the ground on the probability of the target decreases as the number 
of links in the chain increases. And in the limit that the number of links 
goes to infinity, this probabilistic influence vanishes completely, leaving the 
probability of the target fully independent of the probability of the ground. 
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Chapter 7 
Higher-Order Probabilities 


Abstract 

At first sight, a hierarchical regress formed by probability statements about 
probability statements appears to be different from the probabilistic regress 
of the previous chapters. After all, the former involves higher and higher- 
order probabilities, whereas the latter is an epistemic chain in which one 
proposition or belief probabilistically supports another. Closer examination, 
however, teaches us that the two regresses are in fact isomorphic. A model 
based on coin-making machines demonstrates that the hierarchical regress is 
consistent. 


7.1 Two Probabilistic Regresses 


We have extensively discussed chains of propositions which probabilistically 
support one another. But in Chapter 3 we did mention that Lewis, and inde- 
pendently Russell, seemed sometimes to be talking about higher-order prob- 
ability statements rather than about straightforward chains of propositions.! 
The ambiguity is understandable enough. As we have seen, both Lewis and 
Russell took the view that probability statements like ‘q is probable’ or ‘the 
probability of g is x’ only make sense if one assumes that something else is 


! Section 3.2, footnote 8, and Section 3.3, footnote 22. Cf. Reichenbach 1952, 151, 
where mention is also made of a probability of a probability. Roderick Chisholm has 
taken issue with Reichenbach’s idea (especially as it is expressed in Reichenbach 
1938), but in turn received criticism from Bruce Aune (Chisholm 1966, 22 ff, Aune 
1972). 
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certain. The question then arises what exactly this ‘something else’ could be, 
and two answers appear to be natural. 

According to the first, the ‘something else’ is the reference class on the 
basis of which the unconditional probability of q is determined. Lewis, we 
recall, argued that ‘the probability of q is x’ is in fact elliptical for ‘the prob- 
ability of q is x, on condition that A’. In many cases A, will be assumed to 
be certain, and thus to have probability unity. If this assumption is not made, 
then one has to assume that A2 is certain in ‘the probability of A; is x, on con- 
dition that Az’. This reasoning forms the background to Lewis’s conclusion 
that a regress of probability statements only makes sense if it is rooted in a 
certainty. According to the second answer, however, it is the entire probabil- 
ity statement that is taken to be certain. In asserting ‘the probability of q is 
x’, one usually presupposes that this assertion itself has probability unity. If 
one does not, then one might assume that the probability that the probability 
of this assertion is y (with y smaller than one) is certain. In other words, with 
the abbreviation of ‘the probability of q is x’ as A, one way in which A, 
could fail to be certain is if the assertion ‘the probability of the probability 
that A, is y’ (call this assertion A2) is one. 

These two answers lead to two different readings of a probabilistic regress. 
According to the first, the regress states (with v, standing for the uncondi- 
tional probability values): 


the probability of q, on condition that A, is true, is vo; 
the probability of Aj, on condition that A> is true, is v1 ; 
the probability of Az, on condition that A; is true, is v2 ; 
and so on. 


According to the second reading, the regress amounts to: 


A: the probability of q is vo; 
An: the probability of Aı is vi; 
A3: the probability of Az is v2 ; 
and so on. 


In the first kind of regress every An represents a condition on the probabil- 
ity of q or on that of A„-ı. In the presence of such a regress, as we have 
seen, we generally are able to determine the unconditional probability of q 
via an infinite iteration of the rule of total probability. In fact, as we have 
explained, the iteration need not even be infinite in order for us to compute 
the unconditional probability of g to an acceptable approximation. However, 
in the second regress every A, names a statement about a probability. It thus 
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involves infinitely many statements about ever and ever higher-order proba- 
bilities, whereas the first regress refers to an infinite number of conditions. 

Up to this point we have concentrated on the first kind of regress. In this 
chapter we shall focus on probabilistic regresses of the second kind, cul- 
minating in infinite series of probability statements about probability state- 
ments. We start in Section 7.2 by discussing probability statements of second 
and higher order. We will see that, although second-order probabilities do not 
pose any particular problem, many philosophers have objected to probability 
statements of a higher than second order. Especially the indefinite accumu- 
lation of probabilities to infinity has been generally regarded as not making 
sense. 

In Section 7.3 we discuss an objection that Nicholas Rescher made to 
infinite-order probabilities. Our analysis of Rescher’s argument will reveal 
that the above mentioned two readings of a probabilistic regress are in fact 
isomorphic, and in 7.4 this isomorphy will be demonstrated in a more for- 
mal way. Since regresses under the first reading are coherent, the isomor- 
phy tells us that those under the second reading are too. Thus the proper- 
ties of regresses under the first reading, such as those of fading foundations 
and emerging justification, are also properties of regresses under the second 
reading. In Section 7.5 we make the concept of infinite-order probability 
statements explicit by describing an executable model. 


7.2 Second- and Higher-Order Probabilities 


Suppose that the probability of the target proposition q is vo: 


P(q) = vo. (7.1) 


If we know that (7.1) is true, then there is no more to be said; but what if we 
lack this knowledge? In that case, we may only be in a position to assert a 
further probabilistic statement like 


P(P(q) =vo) ="1, (7.2) 


which is a second order probability statement, saying that the probability that 
(7.1) is true is vı. Does (7.2) make sense? It can be argued that it does not. For 
if one supposes that (7.1) implies that P(g) = vo is true, then P(P(q) = vo) = 
1, and so, unless vy = 1, (7.2) would be inconsistent with (7.1). A way to 
avoid such an inconsistency would be to introduce two different probability 
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functions instead of one, viz. P(!) and P(?). For evidently the intention is that 
(7.2) should adjust the initial bald statement (7.1). Thus we need to replace 
(7.1) and (7.2) by 


P(g) = vo 
PO (POY (q) = vo) = v1, (7.3) 


where PĒ) is a second-order probability function. 

However, objections have been raised against second-order functions like 
PC), based on the contention that it is unclear what they mean. David Miller 
even argued that they lead to an absurdity.” In his view the only way second- 
order probability statements could make sense, if at all, would be if the 
second-order probability of q, given that the first-probability of q is vo, is 
itself vo: 

P?) (q\P\ (q) = vo) = vo. (7.4) 


He then goes on to argue that (7.4) leads to an unacceptable conclusion. For 
if we replace vo in (7.4) by P® (~q), we obtain 


Page (q) = PÜ)(-q)) = P\ (=), 


which is the same thing as 
P?) (q/P“ (q) = 3)) =P (9). 


However, if instead we put 4 for vo in (7.4), we find Pt) (q\P)(q) =4)=3. 
Therefore P) (~q) = 5, and thus PÜ)(g)= 5. So if (7.4) were unrestrictedly 
valid, we could prove that the probability of an arbitrary proposition q is 
equal to one-half, which is absurd. This is known as the Miller paradox. 
Brian Skyrms has argued against Miller’s reasoning. Although Skyrms 
maintains that (7.4) is perfectly acceptable, playfully dubbing it ‘Miller’s 
Principle’, he points out that Miller’s further reasoning is fallacious, since 
it “rests on a simple de re-de dicto confusion”.” As Skyrms explains, one 
and the same expression is used both referentially and attributively, so that 
a number (here vo) is wrongly put on a par with a random variable, here 
PU) (~q), that takes on a range of possible values.* So long as we recognize 
this confusion and keep the two levels apart, the notion of a second-order 
probability is harmless, and the Miller paradox disappears. We agree with 


? Miller 1966. 
3 Skyrms 1980, 111. 
4 See Howson and Urbach 1993, 399-400, who make a similar observation. 
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Skyrms that Miller’s Principle as such is harmless, but in what follows we 
will not need it: our reasoning goes through without the principle. 

In addition to parrying Miller’s argument, Skyrms warded off another ob- 
jection to second-order probability statements, namely one that can be dis- 
cerned in de Finetti’s work. As is well known, de Finetti held that probability 
judgements are expressions of attitudes that lack truth values. Skyrms how- 
ever pointed out that de Finetti’s work is not particularly hostile to a theory 
of second-order probabilities: 


For a given person and time there must be, after all, a proposition to the effect 
that that person then has the degree of belief that he might evince by uttering 
a certain probability attribution. 

De Finetti grants as much: 


The situation is different of course, if we are concerned not with the 
assertion itself but with whether ‘someone holds or expresses such an 
opinion or acts according to it, for this is a real event or proposition. 
(de Finetti 1972, 189) 


With this, de Finetti grants the existence of propositions on which a theory 
of higher-order personal probabilities can be built, but never follows up this 
possibility. ° 


De Finetti and Skyrms are not alone in having taken the view that second- 
order probabilities need not pose any particular problem. Several other au- 
thors recognize that, when the relevant distinctions are taken into account, 
second-order probabilities can be shown to be formally consistent. This is 
not to say that such probabilities are mandatory. As Pearl has explained, 
second-order probabilities, although consistent, can be dispensed with, for 
one can always express them by using a richer first-order probability space.’ 

Once we accept the cogency of second-order probabilities, there is no 
impediment to constructing probabilities to any finite order. We could con- 
tinue the sequence (7.3) and introduce a hierarchy of higher-order probability 
statements: 


Pq) = vo 
P?) (PO (q) = vo) = v1 


> Skyrms 1980, 113-114. 

6 Uchii 1973; Lewis 1980; Domotor 1981; Kyburg 1987; Gaifmann 1988. 
7 Pearl 2000. 

8 See Atkinson and Peijnenburg 2013. 
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and so on, with P(”) being the mth-order probability. In the previous sec- 
tion we introduced Aı, Az and A3 as names of probability statements. Here 
we shall specify the probabilities in question more fully by stipulating their 
orders: 


A, is the proposition P} (q) = vo 


Ao is the proposition P?) (A; 


) ) = V1 
A3 is the proposition p(3) (A2) = v2, 


and so on. With these definitions, (7.5) can be written as 


P™ (q) = vo 
PĒ (A) =v 
PO (A2) = vo, (7.6) 


and so on. 

However, the key question is of course not whether any finite series of 
higher-order probability statements is cogent, but whether the notion of 
infinite-order probabilities make sense. Is it coherent to continue the above 
sequence ad infinitum, in the limit defining a probability, P) (q), of infi- 
nite order? Leonard Savage has answered this question in the negative. For 
him, the mere fact that second-order probabilities provoke the introduction 
of probability statements of infinite order was enough to discard them alto- 
gether: 


Once second order probabilities are introduced, the introduction of an endless 
hierarchy seems inescapable. Such a hierarchy seems very difficult to inter- 
pret, and it seems at best to make the theory less realistic, not more.” 


His conclusion is that “insurmountable difficulties” will arise if one opens 
the door to second-order probabilities and starts using such phrases as “the 
probability that B is more probable than C is greater than the probability that 
F is more probable than G”.!° 

Savage was mainly talking about statistics, but in philosophy too it has 
been argued that an infinite order of probabilities of probabilities leads to 
problems that are insuperable. Thus David Hume argued in A Treatise of 
Human Nature that an infinite hierarchy implies that the probability of the 
target will always be zero: 


9 Savage 1954, 58. 
10 Thid. 
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Having thus found in every probability ...a new uncertainty ...and having 
adjusted these two together, we are oblig’d ...to add a new doubt .... This is 
a doubt ...of which ... we cannot avoid giving a decision. But this decision, 
... being founded only on probability, must weaken still further our first evi- 
dence, and must itself be weaken’d by a fourth doubt of the same kind, and 
so on in infinitum: till at last there remain nothing of the original probabil- 
ity, however great we may suppose it to have been, and however small the 
diminution by every new uncertainty.!! 


Nicholas Rescher, in his book Infinite Regress: The Theory and History of 
Varieties of Change, also argued against an infinite hierarchy of probabilities. 
As he sees it, the problem with such a hierarchy is not that the probability 
of the target q will always be zero, but rather that it becomes impossible to 
know what that probability is: 


... unless some claims are going to be categorically validated and not just ad- 
judged probabilistically, the radically probabilistic epistemology envisioned 
here is going to be beyond the prospect of implementation. ...If you can 
indeed be certain of nothing, then how can you be sure of your probability 
assessments. If all you ever have is a nonterminatingly regressive claim of the 
format ...the probability is .9 that (the probability is .9 that (the probability 
of q is .9)) then in the face of such a regress, you would know effectively 
nothing about the condition of q. After all, without a categorically established 
factual basis of some sort, there is no way of assessing probabilities. But if 
these requisites themselves are never categorical but only probabilistic, then 
we are propelled into a vitiating regress of presuppositions. !* 


l1 Hume 1738/1961, Book I, Part IV, Section I. See also Lehrer 1981, for simi- 
lar reasoning. As we noted in Section 3.3, Quine states in his lectures on Hume 
that this Humean argument is incorrect, since an infinite product of factors, all less 
than one, can be convergent, yielding a non-zero probability for the target (Quine 
2008). Quine is right to point out this possibility, but note that it corresponds to 
what happens in our exceptional class, not in the usual class. Moreover, the pos- 
sibility can only serve as a critique of Hume if one forgets about the second term 
in the rule of total probability, i.e. if all the B, = P(A,|7A,+1) are zero. As we 
have seen, Hume does indeed leave out that term, as would Lewis and Russell many 
years later. Hume’s argument is therefore not generally valid. Thus Quine’s analy- 
sis of Hume is based on two unwarranted assumptions: first he assumes that all the 
conditional probabilities &, belong to the exceptional class, i.e. the class of quasi- 
bi-implication, and second he supposes that all the conditional probabilities B,, are 
zero. What diminishes as the chain lengthens is not the probability of the target, as 
Hume and Quine thought, but rather the incremental changes that distant links bring 
about. 

12 Rescher 2010, 36-37. Rescher has p rather than q. Furthermore, Rescher explic- 
itly conditions all his probabilities with respect to some evidence, E, and therefore 
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The argument of Rescher may seem plausible and persuasive. Yet we shall 
argue in the next section that an endless hierarchy of probabilities is in fact no 
stumbling block to having effective knowledge about the probability that q 
is true, let alone that it constitutes “an unsurmountable difficulty”, as Savage 
would have it. In a sense the opposite is the case. If there is a stumbling 
block, it resides in the finite, not in the infinite hierarchy. For an infinite 
hierarchy of probabilities is, to a certain extent, better equipped to reveal 
the probability of q than is a finite one. The reason is reminiscent of the 
reason why a probabilistic regress of the sort that we have investigated in the 
previous chapters is cogent: in order to compute an infinite sequence, only 
the conditional probabilities need be known, whereas the computation of a 
finite sequence requires also knowledge of an unconditional probability. 


7.3 Rescher’s Argument 


In this section we shall examine Rescher’s claim: “If all you ever have is a 
nonterminatingly regressive claim of the format ...the probability is .9 that 
(the probability is .9 that (the probability of q is .9)) then in the face of such a 
regress, you would know effectively nothing about the condition of q”, which 
amounts to putting vo, vı and vz in (7.6) all equal to 0.9. We will show in this 
section that Rescher’s assertion is in fact ill-founded. 
Imagine, following Rescher, that we have a probability statement of the 
third order: 
PO (PË (PO (q) = 0.9) =0.9) =0.9. (7.7) 


Some philosophers conclude on the basis of (7.7) that the unconditional 
probability of g is 0.9, since no matter how many times one iterates, the 
probability value always stays the same.'? This conclusion is also incorrect, 
but the question remains as to what is the correct conclusion that can be 
drawn from (7.7) about the unconditional probability of q. 

Consider the definitions 


A: The first-order probability P\) of q is 0.9, 
A2: The second-order probability P) of Aj is 0.9, 
A3: The third-order probability PS) of Az is 0.9, 


instead of Eqs.(7.1)—(7.2) he has Pr(p|E) = vo and Pr(Pr(p|E) = vo|E) = vı (ibid., 
36 — misprint corrected). In the interest of notational brevity, explicit reference to 
E will be suppressed. 

13 See for example DeWitt 1985, 128. 
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and so on. In the rest of this section we will temporarily suppress the or- 
ders of the probabilities to facilitate an intuitive grasp of the course of the 
reasoning. So we have: 


A,: The probability of q is 0.9, 
A2: The probability of A; is 0.9, 
A3: The probability of A% is 0.9, 


and so on. We will now successively revise P(q); in the next section, when 
we approach the matter more formally, we will reinstate the higher orders. 
We call on the rule of total probability, 


P(q) = P(q|A1)P(A1) + P(q|77A1)P(“A1), (7.8) 


in which the probability of q is conditioned on that of A4. In order to evaluate 
the unconditional probability of Aı, this formula must be repeated in the 
familiar way, with A; in the place of q, and A; in the place of Aj, 


P(Aı) = P(Ai|A2)P(A2) + P(A1|7A2)P(“A2), (7.9) 


and so on. Is it possible to calculate P(q) if the format goes on to infinity? 
Rescher thinks not. If the hierarchy is endless one cannot know anything 
about the probability of g, for “we are propelled into a vitiating regress of 
presuppositions”!*. The situation looks like a probabilistic analogue of the 
Tortoise’s interminable query to Achilles, where the latter successively sat- 
isfies the former pro tem in higher and higher-order querulousness without 
end." 

However, this similarity is only apparent. Between the probabilistic and 
the nonprobabilistic version of the Tortoise’s challenge to Achilles there is an 
essential difference: the latter might be hopeless, the former is not. It is true 
that the Tortoise can always ask about an unknown P(A,„) after the weary 
warrior has taken n steps in his argument. It is also true that the unknown 
P(A,„) could have any value between zero and one. However, the influence 
that P(A,) has on the value of P(q) will be smaller as the distance between A, 
and q gets bigger — even if P(A„) were to take on the largest allowed value 
of 1, see Section 4.3. As we know now, in the limit that n tends to infinity, 
the influence of P(A„) on P(q) will peter out completely, leaving the value 
of P(q) as a function of the conditional probabilities alone. Note again that 
this is not because P(A,,) itself becomes smaller as n becomes larger: indeed, 


14 Rescher 2010, 37. 
15 Carroll 1895. 
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it may not do so. Nor is it simply because the iteration of (7.8) and (7.9), 
etc. leads to a series of terms that is convergent. Rather it is because P(A,) 
is multiplied by a factor that goes to zero as n tends to infinity. Each time 
Achilles has taken one more step, and the Tortoise has asked about P(Ay+1), 
this worrisome probability is multiplied by an even smaller factor, and after 
yet another step the Tortoise’s P(A,+2) is multiplied by a yet smaller factor 
still, and so on, until the factor has shrunk to zero. 

Referring back to (7.8), we know from Miller’s Principle that the term 
P(q|A1) is equal to 0.9. In Rescher’s example, the third term, P(q|—A1), is 
not specified, but it will be clear that (7.8) cannot be evaluated without it: as 
long as the value of the third term is unknown, one cannot determine P(g). 
For the sake of argument, we shall set this term equal to 0.3. It should be 
noted that no strings are attached to this choice of 0.3, since the argument is 
robust: whatever nonzero value of P(q|-Aı) is chosen, so long as it is less 
than P(q|A1), the same reasoning will work. 

Now (7.8) can be worked out: 


P(q) = [0.9 x 0.9] + [0.3 x 0.1] = 0.84. (7.10) 


The number 0.84 was arrived at on the provisional assumption that the sec- 
ond term, P(A,), indeed equals 0.9, which would be correct if it were the 
case that 


P(P(Aı) = 0.9) = P(A,) =1. 


But that is wrong, for P(A2) = 0.9. This means that P(Aı) should rather be 
P(A,) = [0.9 x 0.9] + [0.3 x 0.1] = 0.84, (7.11) 


where, similarly, 0.3 is taken to be the value of P(A;|—Az), and so on. On the 
basis of this new result, the value of P(g) in (7.10) must be revised, yielding 


P(q) = [0.9 x 0.84] + [0.3 x 0.16] = 0.804. (7.12) 


However, the number 0.804 was arrived at on the fictional assumption that 
the second term in (7.11), to wit P(A2), indeed equals 0.9, and thus that 


P(P(A2) = 0.9) = P(A3) = 1. 


But that is also wrong, for P(A3) = 0.9. This means that P(A2) should rather 
be 
P(A2) = [0.9 x 0.9] + [0.3 x 0.1] = 0.84. (7.13) 
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On the basis of this, P(A) is revised to 

P(A) = [0.9 x 0.9] + [0.3 x 0.1] = 0.804. (7.14) 

This new value for P(A) implies that P(g) must again be revised, generating 

P(q) = [0.9 x 0.804] + [0.3 x 0.196] = 0.7824, (7.15) 


and so on. It should be noted that these ‘revisions’ of the value of P(q) are re- 
ally higher and higher-order probabilities of g. We have suppressed the spec- 
ification of the orders for greater readability: in the next section the technique 
will be explained with more care and with greater generality. 

Here is an overview of the values that P(q) takes after an increasing num- 
ber of revisions: 


Table 7.1 Unconditional probability of q after n revisions 


n 1 2 3 5 10 15 20 oo 
P(q) 0.84 0.804 0.7824 0.7617 0.7509 0.75007 0.750005 3 


There are three important lessons to be drawn from these seemingly tedious 
calculations. 

The first is that an endless hierarchy of probabilities can indeed deter- 
mine what the probability of the original proposition is — contrary to what 
Rescher and many others have claimed. For it is possible to calculate the 
value of P(q), even in a situation such as the one sketched by Rescher, where 


P(P(P(q) =0.9) =0.9) =0.9, (7.16) 


and so on. With the value that was chosen for P(A„|=A,„+1), namely 0.3, and 
after an infinite number of revisions, P(q) is exactly equal to 3. 

The second lesson is that an infinite number of revisions is not needed to 
come very close to the actual value of P(q). For, as can be seen in Table 7.1, 
there is only a small difference between the value of P(q) after, say, twenty 
revisions and after an infinite number of them. Of course, the size of the 
difference will depend on the numbers that are chosen for the conditional 
and unconditional probabilities in the equations: had the values of the first 
two terms been, for example, 0.8 rather than 0.9, and had P(A,|7A,+1) been 
0.4 rather than 0.3, then not even twenty steps would have been needed to 
come as close to the limit value (which would have been $ in that case). 
There is always some finite number of revisions, such that the result scarcely 
differs from what is obtained with an infinite number of them. 
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The point can be regarded as a quantitative reinforcement of a claim that 
Rescher makes in qualitative terms. Partly in the wake of Kant and Peirce, 
Rescher stresses several times that some infinite regresses should be ap- 
proached in a pragmatic way, in which it is acknowledged that contextual 
factors play an important role and that, at a certain point, “enough is enough”: 


...in any given context of deliberation the regress of reasons ultimately runs 
out into ‘perfectly clear’ considerations which are (contextually) so plain that 
there just is no point in going further. It is not that the regress of validation 
ends, but rather that we stop tracking it because in the circumstances there is 
no worthwhile benefit to be gained by going on. We have rendered a state [or] 
situation by coming to the end not of what is possible but of what is sensible 
— not of what is feasible but of what is needed. Enough is enough. '® 


...in actual practice we need simply proceed ‘far enough’. After a certain 
point there is simply no need — or point — to going on.!7 


Our explanations, interpretations, evidentiations, and substantiations can al- 
ways be extended. But when we carry out these processes adequately, then 
after a while ‘enough is enough’. The process is ended not because it has to 
terminate as such, but simply because there is no point in going further. A 
point of sufficiency has been reached. The explanation is ‘sufficiently clear’, 
the interpretation is “adequately cogent’, the evidentiation is ‘sufficiently con- 
vincing’. ...[T]Jermination is not a matter of necessity but of sufficiency — of 
sensible practice rather than of inexorable principle. ... What counts is doing 
enough ‘for practical purposes’ .!® 


... regressive viciousness in explanation can be averted ...by the considera- 
tion that the practical needs of the situation rather than considerations of gen- 
eral principle serve to resolve our problems here. . . . [I]n the end, what matters 
for rational substantiation is not theoretical completeness but pragmatic suffi- 
ciency. !° 


Rescher’s point is a good one, and it can be buttressed by the reasoning above 
— certainly in the case of an endless hierarchy of probabilities. Beside prac- 
tical reasons for deciding that “enough is enough’, principled considerations 
can be used to determine when there is a negligible difference between the 
value of P(q) after, say, fifteen steps, or after an infinite number of them. 


16 Rescher 2010, 47. 
17 Thid., 82. 

18 Rescher 2005, 104. 
19 Thid., 105. 
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Of course, it is on the basis of the context that the meaning of ‘negligible’ 
is to be understood. If one is happy to know what a particular probability is 
to within, say, one percent, then it is easy to work out, for given conditional 
probabilities, at what point the regress can be terminated, such that the error 
which is thereby committed is less than the desired one percent. 

The third lesson, finally, must by now sound familiar: the further away A, 
is from q, the smaller is the influence that the former exerts on the latter, until 
in the limit it dies out completely. In the end, the unconditional probabilities 
do not affect the value of P(q) at all, only the conditional probabilities matter. 
Contrary to what Rescher suggests, the unconditional probability of g can be 
fully determined on the basis of the conditional probabilities, and of nothing 
else. 

Again, this could be interpreted as a strengthening rather than a critique 
of Rescher’s claims. At several places in his book Rescher explains that one 
of the ways in which an infinite regress can be harmless is when it is sub- 
ject to “compressive convergence”.”° As he phrases it: “compressive conver- 
gence can enter in to save the day for infinite regression” (ibid.). In regresses 
governed by compressibility, “a law of diminishing returns” (ibid., 74) is in 
force, according to which the steps in the regress recede into “a minuteness 
of size” (ibid., 52): 


An infinite regress can thus become harmless when the regressive steps be- 
come vanishingly small in size so that the transit of regression becomes con- 
vergent. An ongoing approximation to a fixed result is then achieved, and the 
regress, while indeed proceeding in infinitum, does not reach ad infinitum. 


In the same vein, a law of diminishing returns can be said to be operating in 
the endless hierarchy of probabilities discussed above. Granted, it is not the 
case that in such a hierarchy the successive steps become smaller, let alone 
that they recede into “imperceptible minuteness”.”” Quite the contrary: in 
the limit that n goes to infinity, as has been shown, it is no impediment if 
P(A,„) tends to the highest possible value, namely 1. Nor is it the case that, in 
the limit, P(A,,) fades into penumbral obscurity in which its nature becomes 
unclear — another way in which, according to Rescher, an infinite regress 
can be harmless.” For the nature of the infinitely remote P(A,,) may be per- 
fectly clear and well-defined. Nevertheless a law of diminishing returns can 
still be said to be in force. Although the probability P(A„) does not shrink in 


20 Rescher 2010, 46. 
21 Thid., 48. 
22 Ibid., 75. 
23 Thid., 52. 
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size, nor becomes dim or otherwise unclear, the influence of P(A„) on P(q), 
and thus the contribution that P(A,,) makes to the value of P(q), diminishes 
as the distance between A, and q increases. This is because the hierarchical 
regress is isomorphic to the probabilistic regress, as we shall now prove. 


7.4 The Two Regresses Are Isomorphic 


In this section we will show that the regress of higher-order probabilities is 
strictly equivalent to the familiar probabilistic regress of propositions. Con- 
sider again (7.6). What is the second-order probability of g? It can be ob- 
tained from an instantiation of the rule of total probability at the second 
level: 


Pg) = ot |A1)P® (Ay) +P (q|7A1)P (Ai) 
Dy + (1-vı) 
pe +E vi, (7.17) 


where vı was defined in (7.6), and 
a) =PO (qA); BË =POlq-4); =o —B. (7.18) 


According to Miller’s Principle in the form (7.4), a”) is equal to vo; but 
since we do not need to call on this principle for our purposes, we will let 
a stand. 

The third-order probability of q is given by 


P'9)(q) = PO) (glA1)P® (41) + PO (g|7A1)P® (7A); (7.19) 


but the probability of A; at third order is no longer vj, as it was at second 
order. Instead 


P9)(Ay) = ae 3) (A2) + PO) (A1 |-A2)P®) (~A2) 
= yn +p a -») 
+H v0, (7.20) 


where V2 was defined in (7.6), and 


ay =PO (Aide); BY =PO); =a) - BP. 
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On substituting (7.20) into Eq.(7.19) we obtain 


PO (g) = a, (B® e v2) + +P 1 —pO — yy») 
= po +y pO +W yy 25 


of) = POA), BE =P%gA),  ? =ah?—B), 7.21) 


which is like Eq.(7.18), except that the conditional probabilities are now at 
third order. 

The pattern should by now be obvious. The (m + 2)nd-order probability 
of q is 


pre (4) = Bo + wBi + wy B2 +---+ WYı - - - Yn-1Pßm + WN - - - YnVm+1 5 
(7.22) 
where we have suppressed the superscript (m +2) on the conditional proba- 
bilities, for reasons of legibility, but they are to be understood. 
Within the usual class we obtain, in the limit that m goes to infinity, 


P (q) = Bo + Bi + NB. + PAB +... , (7.23) 


in which, with Ao doing duty for the target proposition, q, 


On = pe (An lAn41); Bn = p) (An|7An+1) ’ Yn = Oy — Bn ’ (7.24) 


forn=0,1,2,.... 

It will be clear that the above argumentation on the basis of the rule of 
probability is formally the same as our reasoning in Chapter 3. Indeed, (7.23) 
has the same shape as (3.24), so an infinite series of higher-order probability 
statements makes sense. Like our regress of propositions that probabilisti- 
cally justify one another, the regress of higher-order probabilities is subject 
to fading foundations and to justification that gradually emerges as we go to 
probability statements of higher and higher level. 

Let us take stock. We have seen that higher-order probability statements 
are not as unintelligible as has often been thought. From Brian Skyrms and 
others we already learned that probabilities of the second order are not partic- 
ularly problematic; but we have now seen that the same applies to probability 
statements of any finite order, and even that infinite-order probabilities turn 
out to be coherent. The two regresses, the one from the previous chapters and 
the hierarchical one, are formally equivalent. 
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However, formal equivalence is not yet equivalence in a very strict sense. 
We have shown that both regresses have the same form, not that there is 
a bijection between the two. The latter we will prove now, for the really 
conscientious reader. 

We start straightforwardly, with the simplest form of Lewis’s claim ‘if 
something is probable, something else must be certain’, i.e. the form where 
the series consists of only one step, namely from q to Aı. Here the two inter- 


pretations of Lewis’s claim can be symbolized as follows:”* 


(1) If P(g]Aı) = &, and P(q|=Aı) < a, then A, is certain, 
ie. P(Aı)=1. 
(2) It is certain that the probability of q is a, 
ie. P® (PO) (q) =a) =1. 
It is not difficult to see that (1) entails (2). If P(g|A1) = a, and P(A1) = 1, 
then 
P(q) = P(qlAı)P(Aı) +P(4|-A1)P(>A1) 
= ax1+P(gl-p)x0 
and if P(q) = a, which we should write more explicitly as PH) (q) = a, then 
the probability that this is so is one, i.e. P) (PY) ()=a)=1. 

It is a little trickier to show that (2) entails (1). The first thing we have 
to do is to demonstrate that P(?) (PD (q) = a) = 1 entails P (q) = a. The 
difficulty is that P(A) = 1 does not imply A in an infinite probability space. 
On the other hand A does entail P(A) = 1, so if we substitute the proposition 
‘P (q) £ œ for A, we obtain 

PO (q) Aa entails PË(P®(q)#a)=1. 


By contraposition it follows that 


[pP (Pq) #0) =1] entails [PO (q) 4 al, 


or in other words that 
P?)(PY(q) Aa) #1 entails PY (gq) =a. (7.25) 


However, 


Pt) (PD (q) =@)=1 implies that pt?) (Pg) Aa) = 0, 


24 Recall that Lewis does not mention P(q|—A1), but we specifically include the 
condition of probabilistic support. 
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and this trivially means that P(?) (PO) (q) # a) £1. On combining this result 

with (7.25), we conclude that P®) (PD (q) = 0) = 1 entails P® (q) =a 
The rest of the demonstration employs only first-order probabilities, so we 

will drop the superscript. So with P(q) = œ, and the rule of total probability, 


P(q) = P(qlAı)P(Aı) +P(q|-A1)P(>A1), 
we see that, if A; is such that P(g|Aı) = a and P(g|=Aı) < &, then 
a=a x P(Aı) +P(q|=Aı) x P(-Aj): 


Therefore P(A; ) = 1, and so we have shown that (2) entails (1). 

The above shows that the two interpretations of Lewis’s claim are equiv- 
alent when the series consists only of q and A|. However, the interesting 
question is whether the generalization still holds when the series is longer, 
and especially when it is of infinite length. 

The generalization of (1) and (2) above to any finite series is given by: 


(1’) If P(An|An+ı) = On and P(An|An+1) = Bn, with On > Bn, 
for n = 0,1,...m, then it must be that Am+ı is certain, i.e. 
P(Am+1)=1. 

(2') Tt is certain that the mth-order probability of q is vm, i.e. 
P+D) (p) (... (PP (PD (q) = vo) = v1) ons) = Ym) = 1. 


We have incorporated Reichenbach’s correction of Lewis’s position by in- 
cluding ßņ, i.e. the second term in the rule of total probability. The condition 
of probabilistic support has also been included in order to exclude multiple 
solutions. 

We will now show that (1’) and (2’) are equivalent. The right-hand side 
of (7.22) matches that of (3.20) in Chapter 3, excepting only that vm+1 in 
the former replaces P(Am4+1) in the latter. But v»,41 is just the value of 
pm+2) (Am+1), so the two equations have the same form, term for term. Go- 
ing from (1’) and (2’) is immediate, whereas in the opposite direction we 
must first demonstrate that P"+!) (Am) = 1 entails A. But An is a probabil- 
ity statement, so the demonstration is just the same rigmarole as the one we 
detailed above in going from (2) to (1). Thus the finite chains are isomor- 
phic; and therefore, if the conditional probabilities belong to the usual class, 
the infinite chains have the same form too. Infinite-order probabilities are not 


25 A shorter, intuitive ‘proof’ of this result is to say that P(B) = 1 entails B almost 
everywhere, and if B is a measure, namely the proposition PO (q) = 6, then the 
restriction ‘almost everywhere’ loses its bite. 
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only cogent, but they also exhibit the phenomena that we have been talking 
about, in particular those of fading foundations and emerging justification. 

As an example of an infinite-order probability, we take as conditional 
probabilities 


1 1 1 


On = 1 - — + ——; n = —. 
” TE TE Pr n+3 


These are the same as the ones we had in Eq.(3.21) of Chapter 3; but the 
interpretation is now different. Here they refer to infinite-order conditional 
probabilities. However, the equations have the same structure as those in 
Chapter 3; and we can read off the infinite-order probability of q by letting 


m go to infinity in Eq.(3.22), obtaining P) (q) = å. 


7.5 Making Coins 


We have formally proved that an infinite series of higher-order probability 
statements is strictly equivalent to an infinite justificatory chain of the prob- 
abilistic kind. However, we might still have qualms: how can we understand 
the matter in an intuitive way? Being able to check all the steps in an alge- 
braical proof is one thing, it is quite another thing to ‘see through’ the series, 
as it were, and to appreciate what is actually going on. 

In this section we will try to allay these worries by offering a model that 
is intended to make the above abstract considerations concrete. The model is 
completely implementable; it comprises a procedure in which every step is 
specified. The model gives us a probability distribution over all the proposi- 
tions as well as over their conjunctions. It satisfies the Markov condition in a 
very natural way, and we do not have to assume this condition as an external 
condition.” 

Imagine two machines which produce trick coins. Machine Vo produces 
coins each of which has bias œo, by which we mean that each has probability 
Qo of falling heads when tossed; machine Wo, on the other hand, makes coins 
each of which has bias Bo. We define the propositions q and A as follows: 


q is the proposition ‘this coin will fall heads’ 


A, is the proposition ‘this coin comes from machine Vo’ . 


26 In Herzberg 2014 the Markov condition is imposed as an extra constraint. See 
also our discussion in Appendix A.8. 
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We shall use the symbol P; (q) for the probability of a head when A, is true; 
evidently it is the conditional probability of q, given A: 


Pı(q) © P(alAı) 


= P (‘this coin will land heads’ 


‘this coin comes from machine Vo’ ) . 


We know that P; (q) = Q, for if the coin comes from machine Vo, the prob- 
ability of a head is indeed a, for that is the bias produced by machine Vo. 
Note that P, is conceptually not the same as P(!), The former is a conditional 
probability, in this case the probability of q given Aj; the latter is a first-order 
unconditional probability. 

An assistant is instructed to take many coins from both machines, and to 
mix them thoroughly in a large pile. The numbers of coins that she must add 
to the pile from machines Vo and Wo are determined by the properties of two 
new machines: Vı, which produces trick coins with bias a, and W1, which 
produces trick coins with bias ßı. A supervisor has told the assistant that the 
relative number of coins that she should take from her machine Vg should 
be equal to the probability, a, that a coin from V; would fall heads when 
tossed. So if a is for example i then one quarter of the total number of 
coins that the assistant takes from Vo and Wọ are from Vo; the rest from Wọ.” 

The assistant takes one coin at random from her pile and she tosses it. 
Understanding g now to refer to this coin, we can deduce the probability of 
q in the new situation. Indeed, if A2 is the proposition: 


Aa = ‘the relative number of Vo coins in the assistant’s pile is determined 


by the bias towards heads of the V; coins’, 


then we can ask what the probability is that the assistant’s coin falls heads, 
given that A3 is true. We use the symbol P» (q) for this probability. It is equal 
to the conditional probability of q, given A2, which can be calculated from 
the following variation of the rule of total probability:?® 


27 For the sake of this story, we limit @ to be a rational number, so it makes sense 
to say that the number of coins to be taken from Vo is equal to & times the total 
number taken from Vo and Wo. Similarly, in the subsequent discussion, the biases 
should all be considered to be rational numbers. Since the rationals are dense in the 
reals, this is not an essential limitation. 

28 The proof of Eq.(7.26) goes as follows: 


P(qAA2) = P(qA Ai AA2) + P(g A 7A) AA2) 
= P(qlAı AA2)P(Aı AA2) + P(g|7A1 AA2)P(-Aı AA). 


On dividing both sides of this equation by P(A2) we obtain (7.26). 
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P3(q) = P(qlA2) = P(qlAı AA2)P(A1]A2) + P(ql=Aı AA2)P(A1A2). 

(7.26) 
By definition, P(g|A1 AA2) is the probability that the assistant’s coin will fall 
heads, on condition that this coin has come from machine Vo, and that the 
number of Vo coins in the pile is subject to the condition specified by A2. 
Similarly P(g|=A; AA») is the probability that the assistant’s coin will fall 
heads, on condition that this same coin has not come from machine Vo, and 
that A> is true. 

This series of procedures gives rise to a Markov chain. For the condi- 
tion that the assistant’s coin has come from machine Vo is already enough 
to ensure that the probability that this coin will fall heads is ©; and that 
situation is not affected by the condition that A is true, so P(g|Aı AA2) = 
P(q|A1) = &. Likewise, the condition that the assistant’s coin has not come 
from machine Vo guarantees that it has come from machine Wo, and therefore 
ensures that the probability of a head is Bo; again, that is not affected by A2, 
so P(q|7A1 AA2) = P(q|7A1) = Bo. In Reichenbach’s locution, A; is said to 
screen off q from A.”? The screening-off or Markov condition will turn out 
to be an essential part of our model. We shall show that the model, as well 
as the abstract system of which it is an interpretation, are consistent, even if 
the abstract system does not itself satisfy the Markov condition. 

The Markov condition enables us to simplify (7.26) as follows: 


P> (q) = P(q|A2) = P(q|A1)P(Ai|A2) + P(q|7A1)P(-A1 Az) 
= 0001 + Bo(1 — a) 5 (7.27) 


where, as usual, we employ fo as shorthand for P(q|=Aı ). We conclude that, 
if the assistant repeats the procedure of tossing a coin from her pile many 
times (with replacement and randomization), the resulting relative frequency 
of heads would be approximately equal to P)(q), as given by (7.27). The 
approximation would get better and better as the number of tosses increases 
— more carefully: the probability that the relative number of heads will differ 
by less than any assigned € > 0 from a + Bo(1 — a) will tend to unity as 
the number of tosses tends to infinity. 

It is important to understand that P,(q) is not simply a correction to P; (q). 
It is rather that they refer to two different operations. In the first operation it 
is certain that the assistant takes a coin from machine Vo. In the second oper- 
ation something else is certain, namely that the number of Vo coins in the pile 
consisting of Vo coins and Wo coins is determined by the bias towards heads 
of a coin from machine V. The consequence of this difference is substantial, 


29 Reichenbach 1956, 159-167. 
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for in the second operation it is no longer sure that the assistant takes a coin 
that comes from Vo. Instead of being only a correction, P)(q) is the result of 
a longer, and more sophisticated procedure than is P; (q) . 

So much for the description of the model of the first iteration of the 
regress, constrained by the veridicality of A2. In the next iteration, the super- 
visor receives instructions from an artificial intelligence that simulates the 
working of yet another duo of machines, V2 and W2, which produce simu- 
lated coins with biases O2 and fo, respectively. The supervisor makes a large 
pile of coins from his machines V; and Wı; and he adjusts the relative number 
of coins that he takes from V; to be equal to the probability that a simulated 
coin from V2 would fall heads when tossed. So if os is for example 5, then 
equal numbers of coins will be taken from each of the machines V; and W\. 

Let A3 be the proposition: 


A3 = ‘the relative number of V; coins in the supervisor’s pile is determined 


by the bias towards heads of the V2 coins’, 


If A3 is true, then the probability of A2 is equal to @, that is to say 
P(A2|A3) = Q. Again, screening off is essential here: A screens off A; from 
A3. So we may write 


P(A, |A3) = P(A, |A> /A3)P(A2|A3) + P(A; |=A AA3)P(7A2 |A3) 
= P(AılA2)P(A2|A3) + P(Aıl=A2)P(A2|A3) 
= œo + Bi (1 —Q). (7.28) 
This value of P(A;|A3) is handed down to the assistant, and she reruns her 


procedure, but with P(A;|A3) in place of P(A;|Az). Since A, screens off q 
from A; (and from all the higher A,), we calculate 


P3(q) © P(q|A3) = P(qJAı \A3)P(A1|A3) + P(q]=Aı AAs)P(-A As) 
= P(qlAı )P(AılA3) + P(q|7A1)P(>A1|Aa) 
= &P(Aı|A3)+ Boll — P(A1|A3)], (7.29) 


in which we are to replace P(Aı|A3) by 102 + Pı(1 — a2), in accordance 
with Eq.(7.28). This yields 


P3(q) = P(q|A3) = Bo + (œo — Bo) Bi + (œo — Bo)(aı - Bı)aa. (7.30) 


The relative frequency of heads that the assistant would observe will be ap- 
proximately equal to P; (q), as given by (7.30) — with the usual probabilistic 
proviso. The above constitutes a model of the second iteration of the regress, 
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constrained by the condition that the simulated coin of the artificial intelli- 
gence comes from the simulated machine V2, that is by the veridicality of 
A3. 

This procedure must be repeated ad infinitum. A subprogram encodes the 
working of yet another duo of virtual machines, V3 and W3, which simulate 
the production of coins with biases œ and B3, and so on, all under the as- 
sumption that A, is the proposition: 


A, = ‘the relative number of V„_> coins in the relevant pile is determined 


by the bias towards heads of the V,,_; coins’. 


From this it follows that at the (m +2)nd step of the iteration one finds 


Pn+2(9) © P(q|Am+2) = Bo + %B1 +W B2--- +W -- -Im-ı Bm 
+ ++ YmAm+1 ; (7.31) 


where we have introduced the customary abbreviation Y% = Œn — Pn. Under 
the requirement that the conditional probabilities belong to the usual class, 
the sequence P; (q), Pı(g), P3(q)... converges to a limit, P..(q), that is well- 
defined. Moreover, under the same condition the last term in (7.31), namely 
YN - - - YnQm+1, tends to zero as m tends to infinity, so finally 


P«(q) = Bo + %b1 + W% b2 + WN HB3- -- (7.32) 


This has the same form as (7.23). 

In this way we have designed a set of procedures that is clear-cut in the 
sense that it could in principle be performed to any finite number of steps, 
where the successive results for the probability that the assistant throws 
a head get closer and closer to a limiting value that can be calculated. 
To be precise, for any € > 0, and for any set of conditional probabilities 
that belongs to the usual class, one can calculate an integer, N, such that 
|P (q) — P-.(q)| < £, and one could actually carry out the procedures to de- 
termine Py(q). That is, one can get as close to the limit of the infinite regress 
of probabilities as one likes. 

The probabilities in this model are objective, but that is not the essen- 
tial point. What is essential is that the structure to be described is a genuine 
model, which implies that two desiderata have been met. First, the model 
is well-defined and free from contradictions. Second, it maps into the infi- 
nite hierarchy of probabilities. The model has the same form as the prob- 
abilistic regress of Chapter 3, for which we have already given a proof of 
convergence. It also matches the series for the infinite-order probability of 
Eq.(7.23), thereby providing a model for the abstract system of Section 7.4. 
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Chapter 8 
Loops and Networks 


Abstract 

The analysis so far concerned only one-dimensional epistemic chains. In 
this chapter two extensions are investigated. The first treats loops rather than 
chains. We show that generally, i.e. in what we have called the usual class, 
infinite loops yield the same value for the target as do infinite chains; it is 
only in the exceptional class that the values differ. The second extension 
involves multi-dimensional networks, where the chains fan out in many dif- 
ferent directions. As it turns out, the uniform version of the networks yields 
the fractal iteration of Mandelbrot. Surprising as it may seem, justificatory 
systems that mushroom out greatly resemble fractals. 


8.1 Tortoises and Serpents 


In 1956 Wilfrid Sellars famously diagnosed the malaise of epistemology as 
an unpalatable either/or: 


One seems forced to choose between the picture of an elephant which rests 
on a tortoise (What supports the tortoise?) and the picture of a great Hegelian 
serpent of knowledge with its tail in its mouth (Where does it begin?). Neither 
will do.! 


Up to this point our focus has been on finite and infinite chains of proposi- 
tions. We looked, as it were, at an elephant which rests on a tortoise, which 
in turn might rest on a second tortoise, and so on, without end. Pace Sellars’ 
pessimism, we have seen that such structures are not particularly problematic 
if one takes seriously that the relation of support is probabilistic. 


l Sellars 1956, 300. 
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There are now two ways in which we could extend our investigation and 
go beyond one-dimensional chains. The first is to keep the one-dimension- 
ality, but to look at loops rather than chains: this would take us to the sec- 
ond horn of Sellars’s dilemma, where knowledge is pictured as Kundalini 
swallowing its own tail. The other way is to give up one-dimensionality al- 
together and to study multi-dimensional networks. This would take us to the 
coherentist caucus in epistemology, or rather to an infinitist version of it, in 
which ultimately the network stretches out indefinitely in infinitely many di- 
rections. It might seem that such a version will be especially vulnerable to 
the standard objection to coherentism, according to which coherentist net- 
works of knowledge hang in the air without making contact with the world. 
Indeed, as Richard Fumerton noted, if we worry about “the possibility of 
completing one infinitely long chain of reasoning, [we] should be downright 
depressed about the possibility of completing an infinite number of infinitely 
long chains of reasoning”.? 

Remarkably enough however, the opposite is the case. Since the connec- 
tions between the propositions in the network are probabilistic in character, 
we are dealing with conditional probabilities. As we explained in Section 
4.4, the conditional probabilities together carry the empirical thrust, and this 
is even more so in a multi-dimensional system than in a structure of only 
one dimension, for the simple reason that now there are more conditional 
probabilities that may be linked to the world. 

Extending the chains to networks thus enables us to catch it all: to develop 
a form of coherentism which not only is infinitist, but also acknowledges the 
foundationalist maxim that a body of knowledge worthy of the name must 
somehow make contact with the world.” 

We start in Section 8.2 by discussing one-dimensional loops. We will see 
that, if justification is interpreted probabilistically, then it is in general un- 


2 Fumerton 1995, 57. 

3 Thus we do not have many quibbles with William Roche when he argues that 
foundationalism, if suitably generalized, can be reconciled with infinite regresses 
of probabilistic support (Roche 2016). Much depends on what is meant by foun- 
dationalism: as we indicated in Section 4.4, we do not want to become embroiled 
in a verbal dispute. Some commentators write as if foundationalism were the sole 
guardian of empirical credibility and connection to the world. Although others might 
find that position unduly imperialistic, we do not object to being called foundation- 
alists in that sense. We have no issue with a form of foundationalism that takes into 
account fading foundations and the related concept of trading off as it is applied 
to doxastic justificatory chains. Our concern is less about the classification of our 
results than about the results themselves. 
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problematic to maintain that a target is justified by aloop. In Section 8.3 we 
turn to multi-dimensional networks, where the tentacles stretch out in many 
different directions. In Section 8.4 we explain that such a multi-dimensional 
network takes on a very interesting and intriguing shape when it goes to 
infinity. Surprising and somewhat strange as it may sound, if epistemic jus- 
tification is interpreted probabilistically, and if we accept that it can go on 
without end, then justification is tantamount to constructing a fractal of the 
sort that Benoit Mandelbrot introduced many years ago. 

In the final section we explain what happens when the multi-dimension- 
ality springs from the connections in the network rather than from the nodes, 
i.e. when it originates from the conditional probabilities rather than from the 
unconditional ones. We shall see that in a generalized sense the Mandelbrot 
construction is preserved.* 


8.2 One-Dimensional Loops 


Finite loops embody the simplest coherentist system. What about infinite 
ones? It seems that an infinite loop cannot really be called a loop, since there 
is no end of the tail that the Hegelian serpent can swallow. A loop after all in- 
volves a repeat of the same; it may be long, indeed more than cosmologically 
long, but it seems that it may not be infinite, on pain of having no repetition 
at all. Even Henri Poincaré, when he formulated his recurrence theorem, had 
to assume that the universe is finite in spatial extent and of finite energy. 

However, from the fact that a finite loop differs from an infinite ‘loop’, 
it does not follow that an infinite loop is in fact an infinite chain. Our in- 
vestigation in this section will explain that such a conclusion would be un- 
warranted. In what we have called the usual class, the infinite loop indeed 
produces the same result as does the corresponding infinite chain; but in the 
exceptional class infinite loops and infinite chains yield different results, as 
we shall show. 

We saw in Chapter 3 that the probability of the target in a finite linear 
chain can be written as in (3.20), where we have reinstated q in place of Ao: 


P(q) = Po + wPı t+ Nn Bb2 + - - -+ Y1 - - -Yn-1Bm + WN - - - YmP(Am+1) - 


4 Section 8.2 in this chapter, about the loops, is based on Atkinson and Peijnenburg 
2010a; Sections 8.3 and 8.4, which deal with networks, are based on Atkinson and 
Peijnenburg 2012. 
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The general formulation of a finite loop with m + 1 propositions has a similar 
form, except that the (m + 1)st proposition is q itself. Mathematically, there 
is no problem if we insert Am+1 = q into the above equation to yield 


P(q) = Bo + YPßı $ Yn Bo Pees FY Vai +y... YmP(g) ) 
for this yields 


P(q) = Bo + whit %1 b2 + - -+ W1 - - - Yn-1ßBm 
1— WV -- -Ym 


; (8.1) 


which is well-defined, on condition that %Yı ...Yn is not equal to unity.’ 
With that proviso, the solution demonstrates the viability of the coherentist 
scenario in its simplest form, that of a finite one-dimensional loop. 

The fact that a self-supporting finite loop or ring makes good mathemat- 
ical sense is of course not enough. Does it also make sense elsewhere? Can 
a loop that closes upon itself occur in reality? A temporal example of such a 
loop is difficult to come by in the real world, but it can occur in the science 
fiction of time travel. Let q be a proposition stating that young Biff decides 
in 1955 to use the 2015 edition of the sports almanac, A; a proposition as- 
serting that he continues his successful career as bettor until 2015, and A2 
a proposition explaining how old Biff succeeds in borrowing Doc Brown’s 
time machine in 2015, and returns to 1955 in order to give the almanac to 
his younger self. A3 = q would then be a proposition stating that young Biff 
decides in 1955 to use the 2015 edition of the sports almanac ... and so on. 

In fact, the events need not follow one another in time. Consider the fol- 
lowing three propositions: 


C: “Peter read parts of the Critique of Pure Reason”. 
P: “Peter is a philosopher”. 
S: “Peter knows that Kant defended the synthetic a priori”. 


Assuming that all philosophers read at least parts of the Critique of Pure 
Reason as undergraduates, if Peter is a philosopher, then he read parts of the 
Critique. Of course, even if he is not a philosopher, he may still have read 
Kant’s magnum opus. If Peter knows that Kant defended the synthetic a pri- 
ori, he very likely is a philosopher, whereas if he does not, he is probably not 
a philosopher, although of course he might be an exceptionally incompetent 


5 If W7 -- Yn = 1, it follows that each Yq is equal to one. But then all the a, are 
equal to one also, and all the B, are equal to zero, which is the condition of bi- 
implication. This already indicates that a loop does not make sense when entailment 
relations are involved. 
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one, not having understood anything of Kant or the Critique. Finally, if he 
read the Critique, he quite likely knows that Kant defended the synthetic a 
priori, whereas this is rather less likely if he never opened the book. Here 
then is a simple finite loop, consisting of a fixed number of links, namely 
three: 

C+-P+-S-C, (8.2) 


where the arrow indicates that the proposition at the right-hand side proba- 
bilistically supports the one at the left. 

We can make loop (8.2) nonuniform by investing the three propositions C, 
P and S with for example the following dissimilar values for the conditional 
probabilities: 


C: & =P(C|P)=1; Bo=P(C|-P) = m nae nn) 
P: u =P(P|S)\=%; PBı=P(P|-S)= 5 n=O Bi = E 
S: a2 =P(S|C)=4; &=P(S|-C)=2 5 


Then the unconditional probabilities® are 


P(C) = Bo + wPı + HN Bo -0.711 
1- wyıp 

pp) = itn +nPBo _ 9 679 
1- Ph 
1- Wh 


In the above example the number of links was fixed: there were exactly 
three propositions. Here is an example in which the number of links, m, 
can be whatever one likes, showing the cogency of any finite loop. Consider 
again the example (3.21) in Section 3.5: 

1 1 1 


Qn = 1-— — + —_=; = 
n TERR TT Pn 


=1 =- —. 
n n+2 


6 As they must, these numbers satisfy 
P(C)=Bo+ HPP) P(P)=Bit+nP(S) P(S)=ß:+%P(C). 

Incidentally, there is a good reason for considering a loop of at least three propo- 
sitions. For in a ‘loop’ of two links only, there are only three independent un- 
conditional probabilities, for example P(q), P(Aı) and P(q AAı), whereas there 
are four conditional probabilities around the loop, P(q|A1), P(q|7A1), P(A1|q) and 
P(A,|7q), so there must be a relation between them. This difficulty does not arise 
for a loop of three links, for in this case there are seven independent unconditional 
probabilities and only six conditional probabilities around the loop. With more than 
three links on the loop the difference between the numbers of unconditional and 
conditional probabilities is even greater. 
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This example is nonuniform (i.e. the conditional probabilities, &, and ß,, are 
not the same for different n), and it is in the usual class. It is shown in (A.18) 
in Appendix A.5 that Eq.(8.1) reduces to 


3 1 


au (8.3) 


P(q) 


In Table 8.1 the values of P(q) for the chain are reproduced in the first line, 
while the corresponding values for the loop, as specified in (8.3), are given 
in the second line. The difference between the two cases is that, while for the 
chain we had to specify a value for the probability of the ground, which we 
put equal to a half, for the loop no such specification is required. 


Table 8.1 Probability of g for chain and loop P(p) = 5 for chain 


On = P(An|Anti) =1- 5 + 243 Bn = P(An| Anti) = a3 
Number of A, 1 2 5 10 25 50 75 100 œ 


P(q) with chain .625 .650 .688 .712 .732 .741 .744 .745 .750 
P(q) with loop .688 .700 .719 .731 .741 .745 .747 .748 .750 


The probability of the target rises smoothly as the chain, or the loop, becomes 
longer, eventually reaching the value of three-quarters for both the infinite 
chain and the infinite loop. As can be seen, the values of P(g) for the loop 
converge somewhat more quickly than do those for the chain. 

The agreement between the infinite chain and the infinite loop is not lim- 
ited to this special model, for it is an attribute of any example in the usual 
class. This can be seen quite easily, for when the product yoy ...%n tends to 
zero as m goes to infinity, the loop (8.1) yields the infinite, convergent series 


P(q) = Bo + Bi + WNnB2 + WMnYB3--- , (8.4) 


as for the infinite chain in the usual class. 

The uniform case, in which the conditional probabilities are the same from 
link to link, forms an interesting special case, for then the value of P(q) 
turns out to be always the same, no matter how many links there are in the 
loop. This can already be seen without doing the actual calculation. Since the 
propositions are uniformly connected round and round the loop ad infinitum, 
we can immediately understand why it should make no difference how many 
links there are: the value of P(g) should be the same as that for an infinite, 
uniform loop. The actual calculation goes as follows: (8.1) becomes 
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I+y+Y+...y” 
pq) = re (8.5) 


The finite geometrical series 1+ Yy+ Y’ +... y” is equal to (1- y"*!)/(1-Yy), 
and on substituting this we see that the factor (1 — Y"*!) cancels, so 


B ___B 
1—-y 1l1-a+ß` 


P(q) = 


Indeed this does not depend on m at all, so the number of links may be finite, 
or infinite, with no change in the value of P(q). It will be recognized that this 
value is precisely the same as that for the infinite, uniform chain (see Section 
3.7). 

So much for the usual class. What of the exceptional class, in which the 
infinite product of the y’s is not zero? As we have seen, here the chain fails, 
in the infinite limit, to produce a definite answer for the target probability. 
The infinite loop on the other hand yields a unique value. To illustrate this, 
consider again the example (3.25): 


u 1 o (n+1)(n+3) 1 
Pn = nr 2)(n+ 3) m= ae ee 


We find now from (8.1) that 


3 1 
P(q) =~ -——~ 8.6 
DE mr)’ en 
as we explain in detail in Appendix A.6, and this has the perfectly definite 
limit 3: Thus the infinite chain and the infinite loop only differ in the excep- 
tional class. There the infinite chain fails to give a definite answer, but the 
infinite loop does so.’ 


8.3 Multi-Dimensional Networks 


Most systems of epistemic justification are of course much more compli- 
cated than the one-dimensional chains and loops that we have considered so 
far. Certainly modern coherentism envisages many-dimensional nets of in- 
terlocking probabilistic relations. The concept of justification trees or J-trees 


7 The fact that this value of P(q) is the same as that of the loop (8.3), in the usual 
class, is just a coincidence. 
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has been introduced as a graphic representation of the relation in such net- 
works.® Figure 8.1 is an example of a very simple justification tree. This 
tree has two branches, with A; and Aj as nodes on the one level, and Az and 
A‘, as nodes on a lower level. It should be read as: proposition q is justified 
by A; and A}, Aj is justified by Az, and A} is justified by A4. In this section 
we shall describe what happens when we replace the finite or infinite one- 
dimensional probabilistic chain by a finite or infinite probabilistic network 
in two dimensions, along the lines of a justification tree. 


Fig. 8.1 Basic justification tree 


We now make the tree more complicated by allowing that A; and Aj are each 
supported by two, rather than by one proposition, as depicted in Fig. 8.2. 


q 
EN, 
Ai A! 
f RAS N 
Ap As A! Ap 


Fig. 8.2 Complex justification tree 


Here A, is supported by A and A‘; and A is supported by A% and A4’. In 

their turn, A2, A), A), and Ay’ may each be supported by two propositions. 
A complicated tree as in 8.2 could serve as a model for the propagation 

of genetic traits under sexual reproduction, in which the traits of a child 


8 See for example Sosa 1979; Clark 1988, 374-375; Alston 1989, 19-38; Cortens 
2002, 25-26; Aikin 2011, 74. 
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are related probabilistically to those of both the mother and the father. Let 
P(q) again be the unconditional probability that Barbara has trait T. This 
time Barbara is not a bacterium as in Section 3.7, where the reproduction 
was asexual. Rather she is now an organism with two parents, a father and a 
mother. For the purpose of fixing ideas it will prove convenient to talk about 
sexual reproduction and about fathers and mothers, but we should bear in 
mind that the formalism is of course much more general. Also, although we 
shall tell the story in terms of events, it should be kept in mind that everything 
we say applies to justificatory relations between propositions as well. 

Since Barbara stems from two parents, the probability that she has 7 is 
determined by the characteristics of her mother and of her father. Rather than 
two reference classes (the mother having or not having T), we now have four: 
both the mother and the father have T, neither of them has it, the father has 
T but the mother does not, and the mother has T but the father does not. The 
analogue of the rule of total probability is 


P(q) = P(A A Ai) + PoP (~A: A-AY) 
+P(A A41) + P(A AA), 87) 


where A, represents Barbara’s mother having T and A‘ her father having T. 
Here & means “the probability that Barbara has T, given that her mother 
and father both have T”. The other conditional probabilities are analogously 
defined: Bo corresponds to neither parent having T, and % and ô to the two 
situations in which one parent does, and the other does not have T. 

In the nth generation the corresponding expression is 


P(An) = OnP(Anti AAy41) + BaPAntı AA, 41) 
+ MP (An-ı N An +1) + OnP (Ani NAn+ı) ’ (8.8) 


/ 


i for 


where A, stands for one individual in the nth generation, A,+; and A 
that individual’s mother and father. The conditional probabilities are 


On = P(An|An+1 NA, +1) 

AnlAn+ı A An+1) 
An|An+ı AAy +41) 
On = P(An|An+ı AAy41)- 


In order to iterate the two-dimensional (8.8), much as we did in the one- 
dimensional case, we now need more complicated relations for the uncondi- 
tional probabilities. It is no longer sufficient to consider P(A,) and replace it 
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by Bo + (Qo — Bo)P (A2), and so on, for now we are dealing with the proba- 
bility of a conjunction of two parents, A; and A‘. Each of these parents has 
two parents, so we encounter in fact the probabilities of conjunctions of four 
individuals. This can be continued further and further, involving more and 
more progenitors, confronting us with a tree of increasing complexity. 

Fortunately, however, we can often make simplifying assumptions. Here 
we will work under three simplifications: 


1. Independence. The probabilities for the occurrence of the trait T in fe- 
males and in males is independent of one another in any of the n genera- 
tions: 

P(An+1 NAn+1) = P(An+1)P(An+1)- 


This assumption seems reasonable in the genetic context; and it will also 
apply in many more general epistemological settings. 


2. Gender symmetry. The probability of the occurrence of the trait T is the 

same for females and for males in any of the n generations: 
P(An) = P(A,). 

Thus we only consider inheritable traits which are gender-independent, 
such as having blue eyes or being red-haired, and not, for example, having 
breast cancer or being taller than two metres. Similarly, in an epistemolog- 
ical context this assumption will sometimes, but not always be satisfied. 
With this assumption the prime can be dropped on A/,, and in combination 
with the first assumption we obtain 


P(An+ı AAy 41) = P(Anzı)P(An+ı) = P? (Ans). 


3. Uniformity. The conditional probabilities are the same in any of the n 
generations. That is, Œn, Pn, Y% and 6, are independent of n, so we may 
drop the suffix. 


Together these assumptions enable us to simplify (8.8) to the quadratic func- 
tion 


P(An) = OP? (Anyi) + BP?(An+ı) + (Y+ 5) P(An41) P(An+1). (8.9) 


As we will show in the next section, (8.9) leads to a surprising result, for it 
generates a structure similar to the Mandelbrot fractal. 
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8.4 The Mandelbrot Fractal 


In 1977 Mandelbrot introduced his celebrated iteration: 
Int =c+q,, (8.10) 


where c and q are complex numbers.’ Starting with go = 0, the iteration gives 
us successively 


qı =c 

q =c+c 

q3 = c+(c+c?)? 

ga =c+(c+(c+c?))”, (8.11) 


and so on. For many values of c, the iteration will diverge, allowing qn to 
grow beyond any bound as n becomes larger and larger. For example, if c = 1 
we obtain qı = 1, q2 = 2, q3 = 5 and q4 = 26, and so on. 

But if for instance c = 0.1, then q does not diverge, and in this case 
actually converges to the number 0.11271 .... Taken together, all the values 
of c for which the iteration (8.10) does not diverge form the Mandelbrot set, 
which is reproduced in Figure 8.3. 


Fig. 8.3 The Mandelbrot fractal is generated by the complex quadratic iter- 
ation qn = C+G a1 where c =x + iy. 


? Mandelbrot 1977. The variables qn in this section should not be confused with q 
in (8.7), the target proposition of the two-dimensional net. 
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The black area contains the points that belong to the Mandelbrot set. 
Each point corresponds to a complex number, c, being the ordered pair of 
the Cartesian coordinates, (x,y). The edge of the Mandelbrot set forms the 
boundary between those values of c that are members of the set and those 
that are not. It is this boundary, the ‘Mandelbrot fractal’, that has the well- 
known property of being infinitely structured in a remarkable way: no matter 
how far you zoom in on it, you will always find a new structure that is similar 
to, although not completely identical with the Mandelbrot set itself. 

Our aim in this section is to demonstrate that, on condition that œ + B # 
y+ ô, the quadratic relation (8.9) is equivalent to the Mandelbrot iteration 
(8.10). As it turns out, c will be a function of the conditional probabilities 
a, P, y and 6 alone, and will thus be a known quantity. The q’s, on the 
other hand, will be directly related to the unconditional probabilities; these 
are unknown and their values are to be determined through the iteration. 

It will prove convenient first to define € as the average of the conditional 
probabilities y and 6, that is 

eh (+8), 
which is the mean conditional probability that the target — in our case Bar- 
bara — has the trait T, given that only one of her parents has T. Eq.(8.9) now 
becomes 


P(A,) = B +2(€— B)P(Anti) + (& +B —2€)P? (Anti). (8.12) 


On the one hand, this iteration may not look very much like the Mandelbrot 
form (8.10). Firstly, in the latter we go as it were upwards, starting from qn 
and then counting to q„+1, whereas in (8.12) we start with P(A„+1) and it- 
erate downwards to P(A,). Secondly, (8.12) is about conditional and uncon- 
ditional probabilities, and thus about real numbers between zero and one, 
whereas (8.10) is an uninterpreted formula involving complex numbers. On 
the other hand, however, we see that there is an important similarity between 
(8.10) and (8.12). Both are quadratic expressions: the former contains q and 
the latter P? (An+1). In order to transform (8.12) into (8.10) we introduce a 
linear mapping that serves to remove from (8.12) the term 2 (€ — B) P(An+1), 
and also the coefficient (œ + P — 2€). The appropriate linear mapping that 
does the trick, P(A„) — qn, is defined by 


gn = (a+B-2e)P(A„)-P+e. (8.13) 


On substituting (8.12) for P(A,„) in (8.13) we obtain a formula that can be 
rewritten as 
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gn = €(1—£)—B(1- æ) +4741- (8.14) 
The details of this calculation can be found in Appendix D.2. 
Now define 
c=e(l—e)—B(1-a). (8.15) 


Note that c involves only the conditional probabilities, &, ß and €, and so 
is an invariant quantity during the execution of the iteration. On the other 
hand, qn also contains the unconditional probability, P(A,), which we seek to 
evaluate through the iteration. With the definition (8.15), Eq.(8.14) becomes 


Qn =C +ga: (8.16) 


Evidently (8.16) is very similar to the standard Mandelbrot iteration (8.10). 
There is only the one difference which we have already mentioned: instead 
of an iteration upwards from n = 0, the iteration in (8.16) proceeds from a 
large n value, corresponding to the primeval parents, down to the target child 
proposition at n = 0. This difference is however only cosmetic and has no 
significance for the iteration as such. 

We are now in a position to take advantage of some of the lore that has ac- 
cumulated about the Mandelbrot iteration. Some but not all, for there is still 
the second difference that we mentioned: epistemic justification as we dis- 
cuss it here deals with probabilities, and those are real numbers, rather than 
complex ones. Hence we must concentrate on the real subset of the complex 
numbers c in (8.15), namely those for which c = (x,0), corresponding to the 
x-axis in Figure 8.3. It should be noted that, when c is real, all the qn are auto- 
matically real — compare the explicit expressions for the first few n-values, 
just after (8.11). It is known that the real interval —2 < c < 1 lies within 
the Mandelbrot set, but not all of these values correspond to an iteration that 
converges to a unique limiting value. 

However, let us now impose the condition of probabilistic support, with 
exclusion of zero and one. Although 0 < B <a < 1 has the same form as the 
condition of probabilistic support for the one-dimensional chain, it should 
be realized that œ and ß do not have quite the same meanings in the two 
contexts. In the one-dimensional chain, & > ß means that the probability of 
the child’s having trait T is greater if the mother has it than if the mother 
does not have it. In the two-dimensional net, however, & > ß means that the 
probability of the child’s having trait T is greater if both of her parents have 
it than if neither of them do. 

The essential point is that with 0 < B < œ < 1 we can show from (8.15) 
that -4 <c< 4 (see again Appendix D.2). In this domain the Mandelbrot 
iteration is known to converge to a unique limit. Were it not for probabilistic 
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support, convergence would not be guaranteed, indeed a so-called two-cycle, 
in which q, flips incessantly between two values, would have been a possibil- 
ity. Hence the condition of probabilistic support is necessary for convergence 
in this case. 

A fixed point of the mapping (8.16) is a number, q+, that satisfies 


In Appendix D it is proved that the solution 


= — =, (8.18) 


1 1 
a 


is the so-called attracting fixed point of (8.16), meaning that the iteration 
(8.16) converges to qą. Independently of the value one takes as the starting 
point for the iteration (i.e. gy for some large N), attraction to the same qx 
takes place (on condition that the starting point is not too far from qs — 
technically, the condition is that it is within the basin of attraction of the fixed 
point). Under these conditions the starting point or ground has no effect on 
the final value of the target, go. The phenomenon is precisely that of fading 
foundations, now in the context of a two-dimensional net. 

This fixed point (8.18) corresponds to the following fixed point of (8.12): 


Dx (8.19) 


E b 
B+}-e+,/B(-a)+(e-3) 


Note that, if € = 5 (0 + B), which is equivalent to a+ B = 7+, px re- 
duces to B /(1 — a+), and this agrees with the sum of the one-dimensional 
iteration (3.17). 

If ß tends to zero the solution (8.19) is interesting, for it vanishes only 
if e < 5. If g > 4 it tends to the nontrivial value (2e — 1)/(2e — œ) — see 
Appendix D.2. This behaviour is different from that of the one-dimensional 
case, in which the solution always vanishes when ß tends to zero. 

The two-dimensional network is generated by the same recursion that pro- 
duces the Mandelbrot set in the complex plane. True, we have only to do with 
the real line between -1 and i and not with the complex plane (where the 
remarkable fractal structure is apparent). But the point is that the algorithm 
which produces our sequence of probabilities, and that which generates the 
Mandelbot fractal, are the same. 

We have used three simplifying assumptions in proving the above prop- 
erties, viz. those of independence, probabilistic symmetry between A„+1 and 
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A’,,,, and uniformity. There are however strong indications that essentially 
similar results also hold when these assumptions are dropped. Imagine a 
situation in which the probabilities are different for A,,, and A’,,;. Then 
there will be two coupled quadratic iterations, one for P(A„) and one for 
P(A/,). Each of these is related to P(A; +1) as well as P(A’, ,). This is how- 
ever merely a technical complication, for it is still possible to find a domain 
in which the iterations converge. The relation is in fact a generalized Man- 
delbrot iteration, and analogous results obtain. 

The same applies if we drop the assumption of independence. Clearly, 
if Ani; and A, +] are stochastically dependent, we may have to include 
more distant links in the network, which of course complicates matters 
considerably. However, in general terms it means nothing more than that 
the final fixed-point equations will be of higher order. Again a generalized 
Mandelbrot-style iteration will hold sway, and again domains of convergence 
will exist. 

Furthermore, in many situations the conditional probabilities may not be 
uniform: they may change from generation to generation. In those cases the 
iteration will become considerably more involved. We have seen that for 
the one-dimensional chain it proved possible to write down explicitly the 
result of concatenating an arbitrary number of steps. It is true that for a two- 
dimensional net this would be very cumbersome. However, with the use of a 
fixed-point theorem it is possible to give conditions under which convergence 
once more occurs. 

What will happen when the network has more dimensions than two? In 
that case the fixed-point equations will be of even higher order, necessitat- 
ing computer programs for their calculation. The picture itself however re- 
mains essentially the same. The probabilities are determined by polynomial 
recurrent expressions, and there will be a domain in which they are uniquely 
determined. 

We conclude that probabilistic epistemic justification has a structure that 
gives rise to a generalized Mandelbrot recursion. This still holds when 
we abandon our three simplifying assumptions, or when we work in more 
than two dimensions. In short, not only do the algorithms describing ferns, 
snowflakes and many other patterns in nature generate a fractal, but the same 
is true for the description of our patterns of reasoning. 
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8.5 Mushrooming Out 


Consider once more our justificatory chain in one dimension 


g <— A, +— Aa +— A3 +— Ag... 


where the arrow is again interpreted as probabilistic support. Above we have 
constructed multi-dimensional networks by letting new chains spring from 
the nodes, that is the unconditional probabilities. However chains can also 
arise from the connections, that is from the arrows. This possibility seems to 
be have been anticipated by Richard Fumerton. 

Fumerton has observed that many examples of sceptical reasoning rely 
on a principle which he calls the Principle of Inferential Justification. The 
principle consists of two clauses: 


To be justified in believing one proposition g on the basis of another proposi- 
tion A;, one must be (1) justified in believing A; and (2) justified in believing 
that A; makes probable q.!° 


He then argues that, ironically, the same principle is used to reject scepticism 
and to support classic foundationalism: 


The foundationalist holds that every justified belief owes its justification ul- 
timately to some belief that is noninferentially justified. ... The principle of 
inferential justification plays an integral role in the famous regress argument 
for foundationalism. If all justification were inferential, the argument goes, 
we would have no justification for believing anything whatsoever. If all jus- 
tification were inferential, then to be justified in believing some proposition 
q I would need to infer it from some other proposition A|. According to the 
first clause of the principle of inferential justification, I would be justified in 
believing q on the basis of A; only if I were justified in believing Aı. But 
if all justification were inferential I would be justified in believing A; only 
if I believed it on the basis of something else A2, which I justifiably believe 
on the basis of something else A3, which I justifiably believe on the basis of 
something else Ay, ..., and so on ad infinitum. Finite minds cannot complete 
an infinitely long chain of reasoning, so if all justification were inferential we 
would have no justification for believing anything.!! 


10 Fumerton 1995, 36; 2001, 6. We have substituted q and A, for Fumerton’s P and 
E. Fumerton applies the principle in particular to scepticism of what he calls the 
“strong” and “local” kind (Fumerton 1995, 29-31). Strong scepticism denies that 
we can have justified or rational belief; it is opposed to weak scepticism, which 
denies that we can have knowledge. Local scepticism is scepticism with respect to 
a given class of propositions, whereas global scepticism denies that we can know or 
rationally believe all truth. 

11 Fumerton 1995, 56-57. 
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We recognize here the finite mind objection to infinite justificatory chains, 
which we discussed in Chapter 5. This objection, that serves as an argument 
in support of foundationalism, alludes to the first clause of the Principle of 
Inferential Justification, and it consitutes the first part of Fumerton’s epis- 
temic regress argument for foundationalism.!? There is however a second 
part to Fumerton’s epistemic regress argument. This part depends on the 
second clause of the Principle of Inferential Justification, and it has to do 
with multi-dimensionality arising from chains that spring from connections 
rather than from nodes. Here again an infinite number of infinite regresses 
mushroom out in infinitely many directions: 


To be justified in believing q on the basis of Aı, we must be justified in be- 
lieving Aı. But we must also be justified in believing that Aı makes probable 
q. And if all justification is inferential, then we must justifiably infer that A; 
makes probable q from some proposition B1, which we justifiably infer from 
some proposition B2, and so on. We must also justifiably believe that Bj makes 
probable that A; makes probable q, so we would have to infer that from some 
proposition C1, which we justifiably infer from some proposition C2, and so 
on. And we would have to infer that C; makes probable that B; makes prob- 
able that A; makes probable g ... The infinite regresses are mushrooming out 
in an infinite number of different directions.'? 


The consequences of this particular mushrooming out seem to be bleak in- 
deed, as Fumerton notes: 


If finite minds should worry about the possibility of completing one infinitely 
long chain of reasoning, they should be downright depressed about the pos- 
sibility of completing an infinite number of infinitely long chains of reason- 
soli 

ing. 


Fortunately, however, things are not as grim as Fumerton suggests. The sit- 
uation is on the contrary very interesting. For Fumertonian mushrooming 
out generates a Mandelbrot-like iteration of the sort that we described in the 
previous section. 

Let us explain. In the previous chapters we have thought of the conditional 
probabilities as somehow being given: they were measured or estimated, for 


12 For Fumerton’s distinction between the epistemic and the conceptual regress ar- 
gument for foundationalism, see Section 6.1. There we argued that the conceptual 
regress argument amounts to the no starting point objection to infinite epistemic 
chains. 

13 Fumerton 1995, 57. Bj, Cı etc. come in the place of Fumerton’s Fi, G1. 

14 Thid 
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instance in a laboratory, as in our example about the bacteria. With given con- 
ditional probabilities, there is of course no Fumertonian mushrooming out: 
we can iterate the unconditional probabilities in the usual way on the ba- 
sis of the conditional probabilities as our pragmatic starting point. However, 
Fumerton is right to intimate that sometimes the conditional probabilities are 
unknown or at least uncertain; then their values have to be justified by some 
further proposition, which has to be justified by yet another proposition, and 
so on, and we are faced with mushrooming in Fumerton’s sense. How to deal 
with this situation? 
Again let q be probabilistically supported by A1: 


P(qlAı) > P(q|7A1). 


Now suppose that these two conditional probabilities are not given. The only 
thing we know is that “q is probabilistically supported by A,” is in turn made 
probable by another proposition, for example by Bı. The way to express 
this is by writing down the relevant rules of total probability, this time for 
conditional rather than unconditional probabilities: 


P(qlAı) = P(4|A1 AB, )P(B1|A1) + P(q|A1 A7B1)P(>Bi|A1) (8.20) 
P(ql=Aı) = P(ql=Aı ABı)P(Bı|=A1)+ P(ql=Aı A=B )P(=B1|>A;). 


These rules are clearly more complicated than the simple rule for an uncon- 
ditional probability, although we already encountered this complicated form 
in (7.26) of Chapter 7, when we discussed our model for higher-order prob- 
abilities. 15 

The unconditional probability P(q) can be written as 


P(q) = P(q|A1)P(A1) + P(q|7A1) P(~A1), 
and on using (8.20) to evaluate the two conditional probabilities, we find that 
P(q) = [P(lAı ABı)P(BılAı) + P(q|A1 A 7B1)P(>Bi|A1)| P(A1) 


+[P(q|7A1 ABı)P(Bı|A1) + P(g|7A1 A7B,)P(>B,|741)]P(>A1) 
= P(A; AB1) + YoP(A1 A 7B1) + dpP(AA1 A By) + BoP(-A1 A781). 


The last line has precisely the structure of (8.7), reading B; here for A/ there. 
This shows that a single mushrooming out à la Fumerton is isomorphic to 
the two-dimensional equations of the previous section. 


15 An intuitive way of seeing that (8.20) is correct is to realize that, in the reduced 
probability space in which A, is the whole space, all the occurrences of A; can be 
omitted. Then (8.20) reduces to the rule of total probability for an unconditional 
probability. 
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We have seen that, where chains spring from the nodes, the two-dimens- 
ional equations could be extended to equations in many, and even infinitely 
many dimensions, yielding a Mandelbrot structure. The same reasoning can 
be applied here, where chains spring from the connections. If many, or even a 
denumerable infinity of conditional probabilities are in turn probabilistically 
supported, then one has to do with the many-dimensional generalization. 

Of course we will never deal with all these dimensions in reality. Our 
result is first and foremost a formal one. Having said this we should not 
underestimate the relevance of formal results for real life justification. Al- 
though it is true that in justifying our beliefs we can handle only short, finite 
chains, it is thanks to formal reasoning that we can recognize in these chains 
the manifestation of fading foundations: solely through formal proofs do we 
know that what we see in real life justification is not a fluctuation or a coin- 
cidence.!® 


8.6 Causal Graphs 


In the first chapter we briefly referred to the similarities between epistemic 
and causal chains. Especially at a formal level, as we stressed in Chapter 2, 
a chain of reasons and a chain of causes are very much alike. Thus the linear 
chain 


Ao +4 A] 4 A2 ¢ A3 4 Ag 4 sxi (8.21) 


can be interpreted as a one-dimensional causal series, where Ag is the fact 
or event (rather than the proposition) that bacterium Barbara from Chapter 
3 has trait T, and A, is the fact or event that her mother had T, and so on, 
backwards in time. The arrows in (8.21) stand for probabilistically causal in- 
fluences: if a mother has T, it is more likely, but not certain, that her daughter 
will have T. This is in line with ordinary usage, for example when one says 
that smoking causes lung cancer, even though one knows that not all smokers 
contract the affliction, and that some non-smokers succumb to it. To avoid 
cumbersome language, we shall sometimes say that Ag stands for Barbara 


16 As the size and complexity of the multi-dimensional networks increase, it will 
become more and more difficult to have them correspond to empirically based con- 
ditional probabilities. A rather wild speculation is that in the end such a world- 
network might have only one solution. See Atkinson and Peijnenburg 2010c, where 
we mull over the implications of such a speculation, taking as our starting point 
Susan Haack’s crossword metaphor for “foundherentism’ (Haack 1993, Chapter 4). 
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(rather than for the fact that Barbara has T), that A, stands for her mother 
(rather than for the fact that her mother has T), and so on. 

In the language of Directed Acyclic Graphs (DAGs) one would say that 
(8.21) is a DAG just in case the Markov condition holds.!’ This means in 
particular that A; screens off Ag from A> in the sense of Reichenbach, that A2 
screens off A; from A3, and so on.!8 However, the Markov condition is much 
stronger than a screening-off constraint that involves only three successive 
events. The idea is that the ‘parent event’ of a ‘child event’ screens off the 
child from any and all ‘ancestor events’, or combinations thereof. For the 
chain of (8.21), the condition is formally as follows: 


P(An|An+1 AZ) = P(An |An+1) 
P(An|7An+1 AZ) = P(A„|Ar+1) , 


for all n > 0. Here Z stands for any event, Am, in the chain, apart from the 
descendents of A,, i.e. for any m > n+2, or for any conjunction of such 
events, or their negations. This can be written succinctly as 


P(A,| Apt AZ) = P(An| +Ay41); 


where it is understood that +A,+1ı simply means A„+1, and —A„+ı means 
—=An+1. The idea, informally, is that the Markov condition ensures that 
the causal influences which probabilistically circumscribe Barbara’s genetic 
condition are determined by her mother alone, and that one can forget about 
all her ancestors except for her mother. 

It should be stressed that our analysis of the probabilistic regress in no 
way requires the imposition of the Markov condition: fading foundations and 
the emergence of justification in the case of a justificatory regress work just 
as well with, as without the Markov condition. The causal influence of the 
primal ancestor fades away as the distance between Barbara and the ancestor 
increases, and Barbara’s probabilistic tendency to have T emerges from the 
causal regress, whether or not the Markov condition holds. 

It is certainly possible, in a particular causal chain, that fact A2 could have 
a causal influence on Ay directly, apart from its indirect influence through 
A. Hesslow has given an example.!? Birth control pills, A2, directly increase 
the probability of thrombosis, Ao, but indirectly reduce it in sexually active 
women by reducing the probability of pregnancy, A|, which itself constitutes 


17 Spirtes, Glymour and Scheines 1993; Pearl 2000; Hitchcock 2012. 
'8 Reichenbach 1956. 
19 Hesslow 1976. 
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a thrombosis risk. Then the Markov condition, as we have stated it for (8.21), 
would break down, and one would have to add a direct causal link between 
Ao and Az, as shown in Figure 8.4. In this case a modified Markov condition 
could still be in force: now both A, and A» count as parent events of Ag, and 
they together might screen off Ao from the rest of the chain (depending on 
the details of the case, of course). 


Ao <— A, *— A2 +— A3 +— A4 4 
Fig. 8.4 Modified causal chain 


An advantage of the above considerations concerning the Markov condi- 
tion is that they facilitate a demonstration of the consistency of our proba- 
bilistic regress.” This works just as well for the regress of justification as 
it does for the regress of causes. The idea is that, with the Markov condi- 
tion in place, one can work out the probabilities of the conjunction of any of 
the A, in terms of the usual conditional probabilities and the unconditional 
probabilities of the A,, which, as we know, can be calculated from the con- 
ditional probabilities alone (on condition of course that the latter are in the 
usual class). For example, as shown in Appendix A.8, 


P(A, A7A3 A Ag) = (Bi + 1182) (1 — 3) P(Aa)- 


So there is a probability distribution over all the conjunctions of events (or 
propositions), and thus the probabilistic regress is consistent in this sense. 
If the Markov constraint is not imposed, on the other hand, so that the chain 
may not be a genuine DAG, then there are in general many ways to distribute 
probabilities over the various conjunctions; but we are sure that there is at 
least one way, thanks to Markov, that is consistent. 

Let us now progress from one to two dimensions. Consider the tree 8.2 of 
Section 8.3, but now reinterpreted as a causal net: 
Note that, while the direction of epistemic support in Figure 8.2 is from the 
bottom of the figure to the top, the direction of causal influence in Figure 8.5 
is from top to bottom. Thus event q probabilistically causes events A; and A‘, 
and A, in turn causes A» and A}, while A‘, causes A) and A4’. For example, 
q could stand for Barbara’s grandmother — more accurately, for the event 
that Barbara’s grandmother had 7. Through binary fission this grandmother 


20 Herzberg 2013. 
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Fig. 8.5 Two-dimensional causal net with common causes 


would split into two daughter cells, which would probably, but not certainly, 
have T. Then A, could stand for Barbara’s mother, and finally Az for Barbara 
herself, A’, for her sister bacterium. The eventualities A), AJ and AS’ would 
have analogous meanings in respect of Barbara’s aunt and her cousins. 

One would expect the following Markov condition to hold, namely that 
A, screens off A» and A, from all the other events in the net. Thus 


P(A2| +A, AZ) = P(A2| +Aı) 
P(A}| +A, AZ) = P(A}| Ai) 


where Z can be any of q, A}, A5 or AS’, or their negations, or any conjunctions 
of the same. Similarly, A‘, screens off A% and A4’ from q, A1, A2 and A4. One 
would also expect A» and A% to be positively correlated, so 


P(A2 MA3) > P(A2)P(A5), 


although they are conditionally independent in the sense that 


P(A2 NAZ|+Aı) = P(A2|£A1)P(A3| +41). 


This equation is in fact a consequence of the Markov condition. Following 
Reichenbach, we say that A; is the common cause of Az and As, and that 
event A; has brought it about that A> is more likely to occur if A) occurs, and 
vice versa. 

A different kind of causal net is shown in Figure 8.6. Here the causal ar- 
rows go from bottom to top, which is the same as the direction of epistemic 
support in Figure 8.2. In Figure 8.6, A2 could stand for a mother (i.e. for 
the event that a mother carries a particular trait, for example having blue 
eyes), A, could stand for her husband, and A; could stand for their daughter. 
Assuming that mother and father were not related, A2 and A, are uncondi- 
tionally independent, 


P(A2 ^43) = P(A2)P(A5), 
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but they become correlated on conditionalization by A, 


P(A2 AA5|+Aı) P(A2|+A1)P(A5|+A1). 


q 
a 
Aj Al 
Se 7 “* 
A2 Ab A! AM 


Fig. 8.6 Two-dimensional causal net with unshielded colliders 


The subgraph involving A2, A) and A; is a so-called unshielded collider. 
The behaviour of this collider, insofar as conditional and unconditional de- 
pendencies are concerned, is just the opposite of the behaviour of the com- 
mon cause. Clearly Figure 8.6 is more like the two-dimensional justification 
tree of 8.2 than is the common cause graph of Figure 8.5. In the justification 
tree, proposition A, is probabilistically supported by Az and A}: moeities of 
justification accrue to A; from Az and A‘, and from the conditional proba- 
bilities. In the causal collider, A| is probabilistically caused by Az and A}. 
Similarly, parents A} and A‘’ cause A‘, the event that their son carries the 
trait in question. And finally A; and A’ can cause the event that a child in the 
third generation has blue eyes. 

Strictly speaking, Figure 8.6 is inaccurate, or at least ambiguous. The 
point is that A; would not be caused at all by A2 in the absence of A4. We 
should replace Figure 8.6 by Figure 8.7, in which the joint nature of the 
causal influences is explicitly represented. 

Mathematically, such a picture is called a directed hypergraph; and its 
properties have been studied by Selim Berker in the context of justificatory 
trees rather than causal trees.?! Berker makes the point that such hypergraphs 
offer coherentists and infinitists a way of attaching a justification tree of be- 
liefs or propositions to empirical facts. This is done without thereby making 
them foundational trees in which the facts constitute grounds in the sense 
of the foundationalist, that is as regress stoppers. For example, suppose now 
that Az in Figure 8.7 is an agent’s experience that the sun is shining, and that 


21 Berker 2015. 
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Aj Al 


A A, AS AY 
Fig. 8.7 Two-dimensional hypergraph 


A’, is her belief that her eyes and visual cortex are functioning normally. Then 
A, could be the belief that the sun is indeed shining. The crux of the matter 
is that the fact Az does not by itself justify A;, but does so only together with 
Ab. 

Berker claims that a coherentist (or infinitist) account of justification can- 
not consistently be based on probabilistic considerations. His reasoning is 
that the probabilistic coherence of a set of beliefs and experiences is the 
same as that of a similar set in which however all the experiences have been 
replaced by corresponding beliefs. He argues that the first set, the one includ- 
ing experiences, should be accorded a higher degree of justification than the 
second, which lacks experiences and is nothing but a collection of beliefs. 

Berker’s idea seems to hinge on a Humean view in which experiences 
outweigh beliefs. More importantly in the present context, it only bears on 
models in which probabilistic coherence is a sufficient determinant of justifi- 
cation. For models like ours, in which probabilistic coherence is only neces- 
sary, it is not apposite. And of course the phenomenon of fading foundations 
is not restricted to propositions or beliefs: it manifests itself also in the do- 
main of experiences. 

Just as the ground’s share in the epistemic justification lessens, so the mea- 
sure of the ground’s causal influence vanishes in the end. In general, whether 
a regress is epistemic or causal, or whether it is in one or in many dimen- 
sions, justification and causation will progressively emerge and foundations 
will gradually fade away. 
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Appendix A 
The Rule of Total Probability 


Many of the results we use involve an iteration of the rule of total probability: 
in A.1 we explain how this works in detail. A finite number of iterations leads 
to a finite regress of probabilities, and in A.2 it is shown how to calculate the 
maximum error that one can make by limiting oneself to a finite regress. The 
infinite regress is considered in A.3-A.6; here convergence is demonstrated 
and the distinction between the usual and the exceptional classes is defined. 
Attention shifts in A.7 to the peculiar form of a regress of entailments; and 
finally the Markov condition is put under the theoretical microscope in A.8. 
The basic object of interest is a regress of propositions 


Ao, Aj, Ao, tee ‚Am; Am+1 ’ 


in which each proposition, except the target Ao, probabilistically supports its 
neighbour to the left. We first obtain upper and lower bounds on P(A,), as 
estimated from the finite chain; and then we look at the infinite regress and 
show two things: 


(i) The infinite series of conditional probabilities is convergent. 

(ii) On condition that a certain asymptotic condition is satisfied by the con- 
ditional probabilities, the functional dependence of P(A,,) on the value of 
P(Am+1) disappears in the limit m — œ. 


The asymptotic condition will be given explicitly in Section A.4; and the 
‘usual class’ is defined to be the set of all chains of propositions for which 
this asymptotic condition holds. All other chains belong to the ‘exceptional 
class’. ‘Fading foundations’ is the name we have given to the phenomenon 
(ii); and under these conditions P(A;,) is equal to the sum of an infinite series 
of terms involving conditional probabilities only. In the bulk of the book the 
interest is in the target proposition, Ao, which is often denoted by q; but in the 
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interests of generality we shall first give formulae for an arbitrary A, before 


specializing to the case of the target proposition. 


Let us start by recalling some formalism. The unconditional probabilities 


P(A,,) and P(A„+1) are related by the rule of total probability, 
P(An) = Ba + YmP(Antı) 
with the abbreviations 
On = P(An|An+1) Bn = P(An|7An+1) Yn = On — Pn. 


The condition of probabilistic support, Y, > 0, will be imposed. 


A.1 Iterating the rule of total probability 


On iterating (A.1) once we obtain 


P(An) == Bn + Yn [Pa+ı F Yarı P(An+2)] 
= Bn + MBa+ı + mn %r41P(An42) - 


We shall now show that, on iterating (A.1) m — n times, we obtain 
P(An) = Anm+InmP(Am+1) , 
where Ipm is the finite product 
Inm = YaVnt +++ Im» 
with n < m, and where A, is the finite sum 


Anm = Bn T EDEN: RR ar Frew pas Poest Inm-1bm ’ 


(A.1) 


(A.2) 


(A.3) 


(A.4) 


(A.5) 


(A.6) 


with n < m. Note that A, m and I; m involve conditional probabilities only. 
Eq.(A.4) will be proved by the method of mathematical induction. We 

need to show that, for a fixed n less than m, if (A.4) is true for some particular 

m, then it is necessarily true with m replaced by m+ 1. Substitute m + 1 for 


nin Eq.(A.1): 
P(Am+1) = Pm+1 + Ym+1P(Am+2) , 


and insert this into Eq.(A.4). The result is 


(A.7) 
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P(An) E Anm +Inm [Bm+1 T Yn+1P(Am+2)| 
= Anm+1 + Dnm41 P(Am+2) ’ (A.8) 


which is (A.4) with m + 1 written in place of m. Hence, if (A.4) is true for 
some particular m larger than n, it true for all m larger than n. 
Since 


Inn =) 3 Inn+1 = VaYn+1 3 Ann+1 = Bn nd — Bn + %Bn+1 ’ 


it follows that (A.4) is true when m = n + 1, for (A.4) reduces then to (A.3). 
In this way the induction has been completed; and (A.4) has been proved to 
be valid for all m > n. 

In the special case n = 0, (A.4) can be written 


P(Ao) = Po + MP1 + w B2 +- -+ WV - - - m—-1bBn + WN --- MPlAm+ı) 


which is Eq.(3.20) in Chapter 3. 


A.2 Extrema of the finite series 


In this section we limit our attention to a finite regress and calculate the max- 
imum and minimum values that the target probability could have, whatever 
the unknown unconditional probability P(Am+1) might be. An approximate 
value of the target probability is the average of these values; and an upper 
bound on the error is one half of the difference between the maximum and 
minimum values. These results are crucial to our discussion of probabilistic 
justification as a trade-off in Section 5.3. 

Thanks to probabilistic support, %} > 0, both Anm and Inm are non- 
negative. So the minimum value that P(A,,) can have is obtained by setting 
P(Am+1) = 0 in Eq.(A.8); and the maximum value is obtained by setting 
P(Ams1) = 1. Accordingly, P(A,) is not less than P™(A,) and not greater 
than PY (A„), where 


m 


Pr (As) = Anm (A.9) 
Pa” (An) = Anm +Inm . (A.10) 


m 


Since 


min min 
m+1 (An) —P,, (An) = An m+1 = Anm = Iı mBm+1 > 0, 
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it follows that P™"(A,,) is a monotonically increasing function of m. On the 
other hand, 


net (An) -P3 (An) = Anm+1 —Anm t+Inm+1 Thm = Teil 03) < 0, 


so P™*(A;,) is a monotonically decreasing function of m. This means that 
the margin of error that one makes by truncating the regress decreases as one 
adds more links. 

In the particular case n = 0 we have P(Ao) > Ao,m and P(Ag) < Ao,m +Iò,m, 
and these inequalities lead to the following estimates for the target probabil- 
ity and the maximal error committed: 


P(Ao) = Ao. T Aom = 510m 
= bo + Bit---+%N---Yr—2Bn—1 + EWV --- m nt FON --- Ya 


A.3 Convergence of the infinite series 


From (A.2) we have that ß, = æn — Y% < 1 — Yn , because @,, being a probabil- 
ity, cannot be greater than unity. Since ¥, is positive, so is In m = YYn41--- Yin» 
from which it follows that 


Ia mBm+ < YnYn+1--- Ve | = Fakt) = Inm — Inm+ı . 
Therefore, from (A.6), 


Anm < Bn T (Din -Inn+1) + (Gat -Inn+2) Past (Iam —Inm+1) 
= Pn +lan — In m+1 < Bn +1lnn = Pn zu (Qn = Bn) = Qn. 


Now A, m is monotonically increasing as m increases, and since these num- 
bers are bounded by an m-independent number — namely a, — it follows 
from the monotone convergence theorem that A, has a limit as m tends to 
infinity. This means that the series 


P(An) = An co = Pn +I nbn + Ih n1 Bn+2 zer 


is convergent. 

This proof makes use of the condition of probabilistic support, namely 
Yn > 0 for all n. Convergence (in the usual class) can also be demonstrated 
without probabilistic support, but since we are interested in epistemic justi- 
fication, which has probabilistic support as a necessary condition, there is no 
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point in giving the proof without the constraint. Moreover the condition of 
probabilistic support is essential for justification as a trade-off, see Section 
5.3 and Appendix A.2, and for convergence in the probabilistic networks 
that we discuss in Section 8.4. Incidentally, the condition is also required for 
convergence in the exceptional class. 

In the special case n = 0, we conclude that the series (3.24), namely 


Bo + wbi + wn B2 + HNnYB3+---, 


is convergent. In the usual class the series equals the probability of the target 
proposition. 


A.4 When does the remainder term vanish? 


We now wish to discover the condition under which the influence of P(Am+1) 
on the value of P(A,) tends to zero in the limit m — œ. 

Consider first the uniform case, in which the conditional probabilities are 
the same from link to link. If y, = y, independently of n, then 


Trn = Vahi Os (A.11) 


The only exceptional case here is when B, = 0 and œn = 1, which corre- 
sponds to bi-implication. Apart from this extreme situation, it is the case that 
y< 1, so y"-"*! tends to zero as m tends to infinity, and therefore Ip m goes 
to zero in the infinite limit. As a result the remainder term Ij, P(Am4+1) in 
Eq.(A.4) vanishes too, given that P(Am+1) cannot exceed unity. 

This result, that I} m goes to zero in the limit, generally holds even when 
the conditional probabilities differ from link to link. For example, if there is a 
constant, c, less than unity, such that y, < c < 1 for all n, then I; m < ga 
which also dies out in the limit. Moreover, this conclusion is usually true 
even when there is no such constant, c, and % tends to unity. Indeed, I, m 
will be zero in the limit unless %, tends very quickly to unity as m goes to 
infinity — such cases belong to the exceptional class. 

To find a precise condition under which the remainder term is equal to 
zero in the limit, observe that y, = exp(log y,) = exp(—|log ¥| ), so 


In, = Yn Yn+1Ynt2Ynt3 +++ = EXP - 2 | log | (A.12) 


Thus I,» is zero if, and only if the series £% | log y;| diverges. Since all the 
terms in this series are positive, the series can only converge, or diverge to 
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+ (it cannot oscillate). If there is a real number a > 0, and an integer N > a, 
such that 3 
t= h>- (A. 13) 
n 


for all n > N, then |log%| > £, and the series diverges, which means that 
In is zero. Under condition (A.13) the remainder term disappears. 

Summarizing, the remainder term generally goes to zero in the limit of 
infinite m; only in the exceptional class does it fail to do so. 


A.5 Example in the usual class 


The model 
1 _ a+ 1 u 1 


ag " n42 n+2’ 


(A.14) 
belongs to the usual class, since P, behaves like 1/n as n tends to infinity. 
We find, using the notation of A.1, 


= — ntl n+2 m m+1 _ n+l 
Tham = has +++ Yn = ara ten mn 


— n+l 1 _ n+l 1 1 
ThmBm+1 ~~ m42 ^ m44 2 (a nya) 
Hence 
Anm — Pn + In nßa+1 +Inn+1Pn+2 Waa +I m-ıßm 


= 3 a (5 =) + (5 5) (a ae) Fo 


a =) Gr =) (a =) 


zal n+1 ( L} 1 1 ) u 11 1 (n+1)(2m+5) 
~ n+3 2 \nt+2 '! n3 mR m+3/ T 2n+2 2 (m+2)(m+3) ' 
From Eq.(A.4) we have then 
1 (r+1)2m+5) n+1 


P(A,) =1 P(Anzı):  (A.15) 


2(n+2) 2(m+2)(m+3) m+2 
In the limit of an infinite linear chain, m — œ, we obtain 


PA) =l- 


In the particular case n = 0, Eq.(A.15) becomes 
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3 2m+5 1 
P(Ao) = H P 
(Ao) 4 2(m+2)(m+3) m+2 


which is Eq.(3.22) in Chapter 3. 
On the other hand, ifthe A, form a finite loop instead of an infinite chain, 
with Am+1 = Ao, then (A.16) reads 


3 2m+5 1 


(Am+1) ’ (A. 16) 


P(Ao) = H P(Ao); A.17 
(Ao) 4 2(m+2)(m+3) m+2 (Ao) ( 
and this can be solved for P(Ao) to yield 
3 1 
P(Ao) = = - —— A.l 
(Ao) 4  4(m+3)’ or 


which is Eq.(8.3). On substituting the value (A.18) into (A.15) — with P(Ao) 
in place of P(Am+1) — we finally obtain the probability at an arbitrary site 


on the loop: 
1 n+l 


2(n+2) 4(m+3)’ 


P(A,) =1 


which is valid for O < n < m. 


A.6 Example in the exceptional class 


An example in the exceptional class is 


1 (n+1)(n+3) 1 


(n+2)(n+3) Me a ye 


The crucial difference is that here B, and 1 — & = 1 — Pn — Y% both tend to 
zero as fast as 1/n?, as n tends to infinity. To derive P(A,) we first calculate 


Bn — 


a — fatl , n3 m+1 , m+3] _ ntl , m+3 
Inm hm Bs sell en Ewe, aad n+2 m+2 


— n+l m+3 1 — In+l 1 1 
ThimBnv1 = +2 m+2 * (m+3)(m+4) ~~ Zn (m+2 mH) ` 
After some algebra we find 


Anm = Pn + In nbn +Inn+ıPn+2 tot Ta yoo 


— 1 2n+3 In+l 2m+5 
— 2 (n+2)2  2n+2 (m+2)(m+3) * 


From Eq.(A.4) we deduce that! 


! An easier way to derive this equation is by putting Q, = n+? P(An) in Eq.(A.1), 
which leads to 
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1 2n+3 In+l 2m+5 n+1 m+3 
P(A„) = i : 
(An) 2(n+2)? 2n+2 (m+2)(m+ Da 2 m+2 


P(Am+1). 


In the particular case n = 0, this becomes 


3 2m-+5 1 m+3 
P(Amst1), Al 
8 Am+2)(m43) 1 2m+2/ +1) ene 


P(Ao) = 


which is Eq.(3.26). In the limit that m tends to infinity we find formally 


3 1 
P(A0) = Ż + = P(A), 
(40) = 3 +5 P(A.) 
where P(A.) is an indeterminate number in the interval [0,1]. However, for 
the infinite loop we can set P(Am+1) = P(Ao) in (A.19) and solve the linear 


equation for P(Ao): 


P(Ao) = E | / [i aa 


BR; 1 
4 4(m+3)’ 
which is Eq.(8.6). 


A.7 The regress of entailment 


The classical regress is one of entailment, in which every proposition, An+1, 
entails the proposition to its left, A„, for all n = 0,1,2, .... In this case, æn = 
P(An|An+1) = 1 for all n. From (A.2) we have that 


Bn = On — Vn =] he 
Then Eq.(A.6) takes on the form 


Anm =1- Yn +lan(l — Yn+1) +In n+l (1 = Yn+2) Pines +Iim-1(1 — Yn) 
Ze (A.20) 


1 1 1 
On = 2 (= =) Font. 


It is a simple matter to concatenate this relation to obtain the relation between O, 
and Q,,+1, and thence that between P(A,„) and P(Am+1). A similar ploy could have 
been used in Section A.5 by means of the substitution Q, = P(An); but in this 
case not much labour would have been saved. 
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From Eq.(A.4) we then obtain 
P(An) =1 — Inm +InmP(Am+ı) 
which is equivalent to 
P(-A,) = Drm P(Am41). (A.21) 
In the special case n = 0, this reads 
P(=Ao) = HN -+- Yn P(Am+1) 5 
which is Eq.(3.27). 


A.8 Markov condition and conjunctions 


In our discussion of causal chains in Section 8.6, we remarked that a way 
of demonstrating that there indeed exists a probability distribution over all 
the possible conjunctions of the propositions in a probabilistic regress was 
to impose a suitable Markov condition. Here we show how to construct the 
probability of a typical conjunction. 

Suppose then that the following Markov condition holds: 


P(A,| + Ay+41 AZ) = P(An| Anpi) (A.22) 


for all n, where +An+1 means An+1 Or —A„+1, and where Z stands for any 
event, Am, such that m > n +2, or its negation, or for any conjunction of 
such events, or their negations. We shall illustrate how one can calculate the 
probability of any conjunction of the A, or their negations, by working out 
one representative example in detail: 


P(A, A7A3 AA4) = P(A, AA2 A 7A3 AAg) + P(A AnA? A A3 AAy) 
= P(A |A2 \7A3 AA4)P(A2 A Az A Aa) 

+P(A1|=A2 A 7A3 A Ag)P(7A2 A 7A3 A Ay) 

= P(A;|A2)P(A2|7A3 A Aq)P(7A3 A Aq) 

+P(A1|7A2)P(7A2|7A3 \A4)P(7A3 AA4) 

= P(A,|A2)P(A2|7A3)P(7A3|Aq)P(Aq) 

+P(A1|=A2)P(A2|A3)P(A3|A4)P(A4) 

= [+ B1 (1 — B2)| (1 — 3) P(Aa) 

= (Bı + 71 B2)(1— @3)P (A4). 
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The unconditional probability is given by the following convergent series of 
terms that only involve the conditional probabilities: 


P(A4) = Ba + %4Bs + YaYPßs + 4% Pr +... - 


It is assumed that the set of conditional probabilities belongs to the usual 
class. Any other conjunction of propositions or events A,, and ~A, can be 
handled in an analogous manner. 

An interesting consequence of the imposition of the Markov condition 
has to do with the possible transitivity of probabilistic support. To see this, 
consider the following measure of the probabilistic support that A,, gives to 
An: 

S(An, Am) = P(An|Am) —P(An|7Am) - 


According to (A.2), S(An,An+1) = Yr. We shall show that, under the Markov 
condition, and for any m larger than n, S(An,Am) = YnYa+1 - - - Yn-1- The con- 
dition of probabilistic support means that all the y, are positive, and so 
S(An,Am) is also positive for all n < m. This shows that probabilistic support 
is transitive under the Markov condition.” Although the ground, Am+1, sup- 
ports the target, Ao, it does so to a degree that becomes smaller and smaller 
as the chain gets longer and longer. In the usual class the product wy... 
diverges to zero, so the support that Am+1 gives to Ag dwindles away to noth- 
ing as m tends to infinity, whereas in the exceptional class it is positive, so 
in this case the support, although it continues to dwindle, does not go all the 
way to zero.” 
To prove that, under the Markov condition, 


S(An,Am) = MYn+1---Y¥n-15 (A.23) 
we take recourse again to the method of mathematical induction. Consider 


P(An AAm+1) = P(A, A Am AAm+1) +P(A, NA Adi) 
= P(A, [Am AAm+1)P(Am AAm+1) a P(An|-Am AAm+1 )P(-Am AAm+1) 
= P(A„|Am)P(Am|Am+ı)P(Am+ı) + P(An| Am) Pl Am Am+ı)P(Am+ı1) ’ 


where the last line follows because of the Markov condition. On dividing by 
P(Am+ı) we find 


? The first proof of the transitivity of probabilistic support under a condition of 
screening off is in Reichenbach 1956 on page 160, Eq.(12). A later proof can be 
found in Shogenji 2003. 

3 This dwindling of support as the chain increases in length was noted in Roche and 
Shogenji 2014. 
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P(An|Am+1) = P(An|Am)P(Am|Am+1) + P(An|-Am)P(-Am |Am+1) G (A.24) 
Similarly, replacing Am+1 by —Am+1 in the above reasoning, we obtain 


P(An|Am+ı) = P(An|Am)P(Am|7Am+1) + P(An|7Am)P(-Am|7Am+1) . 
(A.25) 
Subtracting Eq.(A.25) from Eq.(A.24), we see that 


P(An|Am+1) — P(An| Am+1) = P(An|Am)[P(AmlAn+ı) — P(Am|Am+ı)] 
+P(An|7Am)[P(7Am|Ams1) — P(-Am|Am+ı)] 
= [P(An|Am) — P(An|7Am)|[P(Am|Am+1) — P(Am|7Am+1)] 


That is, under the Markov condition, 
S(An,Am+1) = S(An;Am)S(Am,Am+1) - 
Now S(Am,Am+1) = Yn-1, sO if Eq.(A.23) is true for some m > n, then 
S(An,;Am+1) = YaYnti «++ Ym-1Ym ; (A.26) 


which has the same form as (A.23), with m replaced by m + 1. Since (A.23) 
is true for m = n + 1, the induction is complete. 


Appendix B 
Closure Under Conjunction 


In Section 6.5 we noted that Tomoji Shogenji has constructed a measure of 
justification that takes account of intuitions regarding closure and indepen- 
dence. Here we shall spell out this measure, J, by the method of one of us.! 
If J(h,e) is a continuous function of x = P(hle) and y = P(A) only, we may 
write 

J(h,e) = F(x,y), (B.1) 


where F(x,y) is a continuous function for x € [0,1] and y € (0,1). Disconti- 
nuities or divergences are allowed if P(h) is extremal (0 or 1), but continuity 
with respect to the conditional probability, P(hle), is required at both end 
points. 

Let h,,h2 and e be propositions such that 


P(hile) = P(hale) = x 
P(h)=P(h)=y 


and let hı and hz be independent of one another, conditionally with respect 
to e, and also unconditionally: 


P(hy Ahzle) = P(h,|e)P(h2\e) =x 
P(hı Aha) — P(hı)P(h2) = a 


If J(hı,e) = s and J(h2,e) = s, then it is required that also J(hı A h2,e) = s. 
Thus J(hı Aha,e) = J(hı,e), and so, from Eq.(B.1), 


F(x,y) =F Ory) (B.2) 


Change the variables and the function from F (x,y) to G(x,u), where 


! Shogenji 2012; Atkinson 2012. 
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_ logx 


u= oo G(x,u) = F(x,y). 


Condition (B.2) becomes 
G(x,u) = G(x?,u). 
For any x € (0,1), we can iterate this equation to obtain 
G(x,u) = G@’,w) = Gf ,u) =... = G(x” u). 


Since the function G(x, u) is required to be continuous at x = 0, we can take 
the limit n — œ and conclude that G(x,u) = G(0,u) = f(u) is an arbitrary 
continuous function of u. Hence 


J(h,e) = s (Z) 


logP(h) ee) 


J(h,e) is an increasing function of P(h|e) and a decreasing function of P(A), 
so it follows that f(u) must be a decreasing function of u (since log P(hle) 
and log P(h) are both negative). The most general function of justification 
that satisfies Eq.(B.2) has the form (B.3), subject to the constraint that f(u) 
is a continuous, monotonically decreasing function of u. 

We shall generalize this result by supposing now only that J(hı,e) > s and 
J(h2,e) > s, instead of the more restrictive J(hı,e) = s and J(h2,e) = s. So 


log P(h, |e) log P(hzļe) 


= 
gP) =! O 


< f(s) 


where the inverse function, f =l js guaranteed to exist, given the monotonic- 
ity of f. Then 


log[P(hı Ahzle)] = log[P(hıle)P(h2le)] 
= logP(hı|e) + log P(hp\e) 
> f—'(s)[log P(hy) + log P(hy)] 
= f(s)log[P(hı)P(h2)] 
= f~'(s)log[P(hy A hz). 


Therefore, remembering again that the logarithms are negative, we have 


log P(h, Ahzle) 


-1 
logPm Am) 7 9 
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and so J(hı Ahz,e) > s. A similar proof works with the inequalities work- 
ing in the opposite direction, i.e. if J(h,e) < s and J(h2,e) < s then J(hı ^ 
h2,e) < s. Moreover, the method extends straightforwardly to an arbitrary fi- 
nite number of independent hypotheses hı,ha,...,h„, instead of two. This 
concludes the demonstration that Eq.(B.3) encapsulates the most general 
measure of justification. 

All measures that satisfy the above conditions are ordinally equivalent to 
one another. For consider two different measures: 

nann tp] at non] 
Because f is a monotonically decreasing function, a necessary and sufficient 
condition that Jı(hı,eı) > Jı (h2,e2), is 
logP(hıleı) _ logP(hz|e2) 
logP(hı) log P(hz) 

and because of the monotonicity of fo, this is a necessary and sufficient con- 
dition that Ja(hı,eı) > J2(h2,e2). Analogous reasoning holds if the sign > 
is replaced by < or by =. Thus all measures of justification are ordinally 
equivalent to one another. 

If h and e are such that P(h|e) = P(h), then J(h,e) = f(1), irrespective 
of the value of P(h) € (0,1). This is the condition of equineutrality, and we 
conventionally set f(1) = 0. If, on the other hand, h and e are such that 
P(hle) = 1, then J(h,e) = f(0), irrespective of the value of P(h) € (0,1). 
This is the condition of equimaximality, and we set f(0) = 1. 

The simplest realization of the above constraints is f(u) = 1 — u, which 
leads to 


logP(hle) _ log P(h|e) — log P(h) . 


J(h,e) =1 = B.4 
he) log P(h) —log P(h) ee 
If J(h,e) > s, then 
log P(hle) 
s 
logP(h) 7 i 


and, since log P(A) is negative, it follows that 
log P(h|e) > (1—s)log P(h) =log[P(h)]', 
which entails 
P(hle) > [P(A)]'~*. 
With q in place of h and A, in place of e, this reads 
P(qlAı) > [P], 
which is the inequality (6.12) of Chapter 6. 


Appendix C 
Washing Out of the Prior 


There is a much-vaunted escape clause that Bayesians use when they are 
confronted with an unsatisfactory feature of their method. The unsatisfactory 
feature is that the final, or posterior probability of a hypothesis depends on 
its prior probability, which is to a large extent arbitrary. The escape clause is 
that repeated updatings of the same hypothesis by more and more evidence 
lead, in favourable circumstances, to the ‘washing out’ of the prior, i.e. the 
insensitivity of the final posterior probability to the precise value that the 
prior probability might have. In the formally infinite limit the posterior is 
independent of the prior. 

A probabilistic regress, within the usual class, has a superficially similar 
property that we have dubbed ‘fading foundations’. The probability of the 
target depends less and less on the probability of the ground as the chain of 
propositions becomes longer and longer, and in the formal limit of an infinite 
regress it is independent of the ground. 

In the next section Bayesian washing out is explained in intuitive terms, 
and then an example is given concerning the bias of a bent coin. In section 
C.3 we point out in detail why Bayesian washing out is quite different from 
fading foundations. 


C.1 Washing out 


Suppose we have some evidence, eı, for a hypothesis, pı, and we can cal- 
culate the likelihood with which e; would obtain if hypothesis pı were true, 
namely P(eı|pı). What we want is rather the probability that pı is correct, 
given that e; is true, and this we calculate from Bayes’ formula: 
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Pleilpi)Po(p1) 


Pl pier) = P(eı) 


(C.1) 
Here Po(pı) is the prior probability that is accorded to the hypothesis pı: 
some Bayesians allow this to depend wholly on whim, others require it to 
be determined by some previous knowledge of the situation in question. In 
any case Po(pı) is to be superseded by the posterior probability, or update, 
Pı(pı) = P(pıleı). The denominator in (C.1) can be computed from the rule 
of total probability: 


P(eı) = P(ei|p1)Po(p1) + Pleil=pı)[1 — Po(pi)]; (C.2) 


on condition that the likelihood P(e,|-pı) can also be calculated. More gen- 
erally, if {p;},i=1,2,...,n, is a partition of the space of hypotheses for the 
situation in question, i.e. p; A pj is impossible for all i Æ j, and the disjunc- 
tion pı V p2 V...V Pn is the whole space, then (C.2) is replaced by 


P(e) = P(eı|pı)Po(pı) + P(e1|p2)Po(p2) +... + P(e1|Pn)Po(pn). (C.3) 


Suppose now that some new evidence, e2, comes in. The old posterior 
probability, P; (p1) = P(pıleı), serves as the new prior, and the new poste- 
rior probability is A (pı) = Pı(pıle2) = P(pıleı A e2). After m pieces of new 
evidence have come in, P„(pı) =P(pıleıNe2 X... Ne) is the final posterior 
probability. The idea is that, if pı is the correct hypothesis, and p2, p3,..., Pm 
are all incorrect hypotheses, the likelihood P(e; A e2 \...\ @m|p1) will be- 
come larger and larger as more and more data comes in, that is, as m in- 
creases, and all the P(e; \e2 X... Aem|p;) with i 4 1 will become smaller 
and smaller. This means that, for large m, P(e; \e2\.../A em) will be equal to 
P(e; \e2 A... A €m|p1)Po(p1) in good approximation, since the other terms, 
depending on p2, p3,.--, Pn, Will be negligible. Hence P(pıjeı \e2 X... \em) 
will be close to 1, and it thus may be expected that 


Pn(p1) = P(piler Ne2X...Aem) (C.4) 


will tend to 1 in the limit. Note that the original prior probability, Po(pı), has 
cancelled, that is to say, it has ‘washed out’. 

This was a quick and dirty explanation of how repeated Bayesian updat- 
ings can be expected to lead one to the true hypothesis as more and more 
evidence is accumulated. A more sophisticated treatment can be found for 
example in the Stanford Encyclopedia of Philosophy (Hawthorne 2014). We 
shall now exhibit this effect explicitly in the case of a bent coin that is tossed 
repeatedly, the purpose being to ascertain its bias. 
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C.2 Example: a bent coin 


Suppose that p is the bias of a bent coin, i.e. the probability that heads will 
come up when the coin is tossed. Let e; stand for the evidence that hı heads 
have turned up in n; tosses. The likelihood P(e;|p), the conditional proba- 
bility that e; would result, is 


nı! 


the factor involving the factorials being the number of different ways that hı 
heads can turn up in nı tosses. 

Suppose though that the bias, p, is unknown. We are interested in the 
inverse conditional probability, P(pleı ), i.e. the probability that a head will 
turn up given the evidence e;. Here is Bayes’s theorem again: 


_ Plei|p)Po(P) 
Pla) ` 


As before, Po(p) is the Bayesian prior, a subjective guess which is to be 
updated by (C.5), on the basis of the evidence, eı. Strictly speaking, Po(p) 
is not a probability, but rather a probability density: the prior probability that 
the bias lies between p and p + dp is the infinitesimal Po(p)dp. 

The denominator in (C.5) can be written as a continuous partition of the 
probability space as follows: 


P(pleı) (C.5) 


1 ny! 1 
P(e) = / dp P(e1|p)Po(p) = ———_— f dpp" (1 — py" Po(p). 
0 hı!(nı -hı)! Jo 
(C.6) 
This takes the place of the sum (C.3) in the discrete case that was considered 
in the previous section. Following the exposition of Howson and Urbach 
(2006), we will insert for the prior a so-called beta distribution: 


Po(p) = B(u,v) p '(1-p)"", (C.7) 


where u and v are to be regarded as free parameters that can be varied to 
give an idea of the arbitrariness that is inherent in the Bayesian approach, 
and where B(u,v) is a normalization factor that need not be specified, since 
it will cancel. The adoption of (C.7) as the prior has no good justification, 
except the rather lame 

... beta distributions take on a wide variety of shapes, depending on 


the values of two parameters, u and v, enabling you to choose a beta 
distribution that best approximates your actual distribution of beliefs.! 


1 Howson and Urbach 2006, 242. 
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On substituting (C.7) into (C.6), we find that we can evaluate the integral. 
It is in fact a beta-function (and that is the main reason, but of course not a 
justification, for choosing (C.7) in the first place). The result is 


(nj +u+v-1)! 
(hy +u 1)!(nı hi +v 1)! 


P(pleı) = pore =p . (C.8) 


This is the posterior probability density corresponding to the value p. That 
is not quite what we were looking for, since it does not give one value for 
the probability associated with our bent coin, but rather a whole spread of 
values. But this is as it should be: one single value for the probability is not 
singled out as the only possibility. We need to calculate the mean value of 
p according to the distribution (C.8), which will give the most likely value 
for the sought-for probability, and the standard deviation, which will indicate 
how uncertain the estimate is. 
Straightforward calculations yield 


hı+u 
Pg =E|p| = ———— 
PB [p] niusy 
_ = Psae 
05 =El[(p-P5)”] = EIER 


where qg = 1 — Pg, and where the subscript ‘3° is to remind us that the 
mean and standard deviation here are Bayesian estimates. The uniform prior, 
Po(p) = 1, which corresponds to the choice u = 1 = v, gives the mean 
Pp = (hı + 1)/(nı +2), which is the celebrated result of Laplace. Let us 
however keep u and v general, since the Laplacean choice is merely one of 
an infinite number of possibilities. 

Suppose that a second run of na tosses is made, in which hy heads come 
up, and take the posterior density (C.8) after the first run of n; tosses as the 
prior density for the second run. With e2 denoting the evidence relating to 
the latter run, it is evident that the new posterior probability density will be 


hy+ho+u-1 (1 — „\nıtna—hı-h2+v-1 
’ 


P(pleıNe&) « p p) 


where we have suppressed the normalization factor. More generally, after 
many runs, with sequential updating, the posterior probability density is pro- 


portional to 


re] _ WE 


P P 


where n is the total number of tosses in all the runs, and A is the total number 
of heads that have come up. The mean is (h+u)/(n-+u-+v), which becomes 
closer and closer to the relative frequency, h/n, as n increases, the standard 
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deviation becoming smaller and smaller. The prior, specified by the constants 
u and v, washes out in the limit. 

So the success of repeated Bayesian updating lies simply in its tending 
to the relative frequency. A statistician might well be forgiven for pointing 
out that one does not need a Bayesian prior and the rigmarole of Bayesian 
updating to come to the conclusion that the expected value of the ratio of the 
number of heads to the number of tosses is equal to the bias of the coin. 


C.3 Washing out is not fading away 


At first sight there might seem to be a similarity between: 


1. The washing out of the prior in Bayesian updating, that is the indepen- 
dence in the infinite limit of the posterior on the prior, and 

2. The fading of the foundation in a probabilistic regress, that is the indepen- 
dence in the infinite limit of the target probability on the probability of the 
ground. 


However, the two effects are very different. In Bayesian updating the Bayes 
formula (C.1) involves the computation of P(pıleı) in terms of the inverse 
conditional probability, P(eı|pı), followed by P(pıleı A e2), and so on, as 
more evidence accumulates. This is quite different from our calculation of 
P(q), in which there is no inversion à la Bayes, but rather a sequence of 
propositions, A1, Ao, ...that follow one another in a linear chain. In a sense 
the dissimilarity between the two could not have been greater. Fading foun- 
dations implies that the more distant propositions in the chain have less in- 
fluence on the probability of the target than do the first few propositions. In 
Bayesian updating, on the contrary, the various pieces of evidence, although 
they are introduced one after another, are actually all on a par, as can be seen 
from (C.4). 


Appendix D 
Fixed-Point Methods 


In Section 3.4 we analyzed the one-dimensional uniform chain of propo- 
sitions by summing a geometrical series. Below, in D.1, we show how a 
fixed-point method can be used to obtain the same result. This serves as an 
introduction to the fixed-point analysis in D.2 of the more complicated case 
of the two-dimensional uniform network that was discussed in Section 8.4. 

It is important to note that the fixed-point method, both in the one- 
dimensional and the many-dimensional cases, only works if there is uni- 
formity, i.e. if the conditional probabilities remain the same throughout the 
chain or network. The analysis that we gave for the one-dimensional chain 
in the text is therefore more general, since it also applies if the conditional 
probabilities are not uniform. 


D.1 Linear iteration 


In Section 3.4 we considered a recursion relation for a uniform chain of 
propositions that has the form 


P(An) = @P(An+ı) + BP(Antı) 
= B+(a@—B)P(An+1). (D.1) 


Here Ao is the target proposition, which we sometimes wrote as q. In 3.4 we 
explained how to calculate P(q) by summing a geometric series. 

Here is the same story in terms of fixed points. The question is: is there a 
special value of P(A„+1), say px, such that if we plug it into the right-hand 
side of (D.1), the very same value, p+, results for P(A„)? Indeed there is, for 
a unique solution of the equation 
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exists, namely 


= B 
1-0+Bß’ 


given that the condition of probabilistic support implies 0 < œ — B < 1. This 
agrees with what we found in Section 3.7 for P(q). We still have to do more 
work, however, before concluding that p, is an attracting fixed point of the 
iteration (D.1); and it will be salutary to sketch what is involved. From (D.1) 
and (D.2) we see that 


P(An) — Px = (@— B)(P(An41) — Pr): 


Since a — ß is less than one, it follows that the distance between P(A„) and 
px, if it is not zero, will be less than the distance between P(A,+,) and px; 
and the distance between P(A„_-ı) and p, will be smaller still. If we start the 
iteration at a very large value of n, and iterate down to n = 0, that is down to 
the target proposition q, we will find that 


P(q) — ps = (a —B)"*" (PlAnst) — Ps): 


Because (œ — ß)"*! will be very small for large n, it is the case that, whatever 
value we choose for P(An+1), the difference between P(q) and p, will be 
tiny; and, in the limit of infinite n, that is for an infinite chain of bacterial 
ancestors, P(q) = px. 

Here p. is the attracting fixed point of the iteration (D.1). One can express 
the essence of this as follows: 


p =B+(a-B)p, 


where one starts with some value for p, and then puts the resulting value 
of p’ back into the right-hand side as a new value for p. This procedure is 
repeated ad infinitum in thought. The fixed point p, attracts p to itself in this 
process. 


Px 


D.2 Quadratic Iteration 


In Section 8.3 we obtained the recursion relation (8.9), namely 


P(An) = AnP? (Ains1) + BP? (-An4+1) + (%n + Ôn)P(An+1)P(~An+1) . 
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Using the fact that the conditional probabilities œn, Bn, Y, and 6, are all non- 
negative, we see from (8.9) that, if P(A,+1) lies within the unit interval (so 
that P(=A,„+1) does so too), then P(A„) > 0. Moreover, since On, Bn, Yr and 
6, are each not greater than one, 


P(An) < P*(Antı) +P? (7Ang1) +2P(Antı) P(Antı) 
= [P(Ans1) + P(-Ansi)/? = 1. (D.3) 


Thus we have demonstrated that 0 < P(A„+1) < 1 entails 0 < P(A,) < 1, and 
this means that the quadratic iteration will not run amok: the probabilities 
remain within the unit interval, as they should; and the question is whether 
P(Ao) tends to a limit as the length of the chain tends to infinity, or whether 
it wanders around indefinitely. 

When the conditional probabilities do not change from link to link, we 
may drop the indices; and, with the substitution of 1 — P(A„+1) for P(-A,„+1) 
in (8.9), we obtain 


P(An) = aP? (An+1) ee 
+(Y+ 8) (P(An+ı) — P? (Ant1)) 
= ßB+2(e-B)P(AnHı)+(@+ß-2E)P(An4ı), (DA) 


with € = 4(y+ ô). On condition that œ + B + 2e, define 


qn = (& +B —2€)P(A,) -—B+E 
= (+B —2e) P +2(€— B)P(Anyi) + (@ +B —2€)P*(Anyi)|-—B+e 
= B(a+B—2e)—Bt+e+ 
2(e — B)(@+ B — 2€) P(Ans1) + (æ +B —2€)?P?(Ani1). (D.5) 


This definition of q„ also implies that 


[(a + B —2€) P(Any1) —B +]? (D.6) 
= (£-ß)’+ 
2(e — B)(@ +B -2E)P(An+ı)+(@+B -2E)?P*(Ay+1). 


2 
An+ı = 


Comparing (D.5) with (D.7), we see that qn = c +97.) , where 


= B(a+B —2e)—B+e-(e—B) 
= e(1—e)—B(1-a@), (D.7) 


which is Eq.(8.15). 
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Since 0 < B < æ < 1 and0 < e€ <1, it follows that 


c<e(l—-€)=4-(5-€) <j 


c>-B(i1-a)>-all-a)=(a-3)-4>-4. 


So we have shown that 


-1<c<i. (D.8) 
A fixed point of the iteration 
qn =c +r; (D.9) 
is 
c 
qx = P 
tyme 


as can be readily verified by substitution, and to find the domain in which this 
fixed point is attracting, we define Sn = qn — qx; and, rewriting qn = c + È 41 
in terms of s,, we have 


Sn = Sn41 |1- VIe sm] (D.10) 


On condition that 


1-vi=& 


and Sn+1 is very small, we conclude that q. is attracting. Indeed, since 


<1, (D.11) 


Sn = Sn+1 = (Sn+1 — $n42) £ vl — 4c + Sn+1 sm2] ; 


the mapping (D.10) is a contraction if |s,,| < y and |1 — y1- 4c +2y| <1. 


This implies that y < i—c when0<c< 1,andYy<1-1/4-c when 


-3 <c<0. Hence if |sy| < y for very large N, and y satisfies the above 
contraction constraint, the iteration backwards to so will be attracted to zero, 
that is to say go will be attracted to q+. The domain of attraction of the fixed 
point is -3 LCS 4, and this covers the interval (D.8). 

Going back to the original form (D.4) of the iteration, we find that the 
solution q, corresponds to the fixed point 


B 
B+3-e+ /Bll-a)+(e-3% 


1 152 
BL ues (e a) (D.12) 
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In the limit that B tends to zero this becomes 


which is zero if € < 3. However, if € > 5 we find the nontrivial value 


O 2€—1 
eg 


Px (D.13) 
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